{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<style>\n",
    "pre {\n",
    " white-space: pre-wrap !important;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(odd) {\n",
    "    background-color: #f9f9f9;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(even) {\n",
    "    background-color: white;\n",
    "}\n",
    ".table-striped td, .table-striped th, .table-striped tr {\n",
    "    border: 1px solid black;\n",
    "    border-collapse: collapse;\n",
    "    margin: 1em 2em;\n",
    "}\n",
    ".rendered_html td, .rendered_html th {\n",
    "    text-align: left;\n",
    "    vertical-align: middle;\n",
    "    padding: 4px;\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Machine Learning (advanced): the Titanic dataset\n",
    "\n",
    "If you want to try out this notebook with a live Python kernel, use mybinder:\n",
    "\n",
    "<a class=\"reference external image-reference\" href=\"https://mybinder.org/v2/gh/vaexio/vaex/latest?filepath=docs%2Fsource%2Fexample_ml_titanic.ipynb\"><img alt=\"https://mybinder.org/badge_logo.svg\" src=\"https://mybinder.org/badge_logo.svg\" width=\"150px\"></a>\n",
    "\n",
    "\n",
    "In the following is a more involved machine learning example, in which we will use a larger variety of method in `veax` to do data cleaning, feature engineering, pre-processing and finally to train a couple of models. To do this, we will use the well known _Titanic dataset_. Our task is to predict which passengers are more likely to have survived the disaster. \n",
    "\n",
    "Before we begin, thare there are two important notes to consider:\n",
    " - The following example is not to provide a competitive score for any competitions that might use the _Titanic dataset_. It's primary goal is to show how various methods provided by `vaex` and `vaex.ml` can be used to clean data, create new features, and do general data manipulations in a machine learning context. \n",
    " - While the _Titanic dataset_ is rather small in side, all the methods and operations presented in the solution below will work on a dataset of arbitrary size, as long as it fits on the hard-drive of your machine.\n",
    " \n",
    "Now, with that out of the way, let's get started!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.005009Z",
     "start_time": "2020-05-01T17:12:35.667407Z"
    }
   },
   "outputs": [],
   "source": [
    "import vaex\n",
    "import vaex.ml\n",
    "\n",
    "import numpy as np\n",
    "import pylab as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adjusting `matplotlib` parmeters\n",
    "\n",
    "_Intermezzo:_ we modify some of the `matplotlib` default settings, just to make the plots a bit more legible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.014957Z",
     "start_time": "2020-05-01T17:12:37.007951Z"
    }
   },
   "outputs": [],
   "source": [
    "SMALL_SIZE = 12\n",
    "MEDIUM_SIZE = 14\n",
    "BIGGER_SIZE = 16\n",
    "\n",
    "plt.rc('font', size=SMALL_SIZE)          # controls default text sizes\n",
    "plt.rc('axes', titlesize=SMALL_SIZE)     # fontsize of the axes title\n",
    "plt.rc('axes', labelsize=MEDIUM_SIZE)    # fontsize of the x and y labels\n",
    "plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels\n",
    "plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels\n",
    "plt.rc('legend', fontsize=SMALL_SIZE)    # legend fontsize\n",
    "plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First of all we need to read in the data. Since the _Titanic dataset_ is quite well known for trying out different classification algorithms, as well as commonly used as a teaching tool for aspiring data scientists, it ships (no pun intended) together with `vaex.ml`. So let's read it in, see the description of its contents, and get a preview of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.069863Z",
     "start_time": "2020-05-01T17:12:37.017532Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>.vaex-description pre {\n",
       "          max-width : 450px;\n",
       "          white-space : nowrap;\n",
       "          overflow : hidden;\n",
       "          text-overflow: ellipsis;\n",
       "        }\n",
       "\n",
       "        .vex-description pre:hover {\n",
       "          max-width : initial;\n",
       "          white-space: pre;\n",
       "        }</style>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div><h2>titanic</h2> <b>rows</b>: 1,309</div><div><b>path</b>: <i>/Users/jovan/Work/vaex/packages/vaex-core/vaex/ml/datasets/titanic.hdf5</i></div><div><b>Description</b>: file exported by vaex, by user jovan, on date 2019-07-04 11:02:26.996867, from source /has/no/path/pandasprevious description:\n",
       "\n",
       "The Titanic dataset. \n",
       "A classic dataset used in many data mining tutorials and demos. \n",
       "Perfect for exploratory analysis and building binary classification models to predict survival.\n",
       "\n",
       "Data covers passengers only, not crew.\n",
       "\n",
       "Column description:\n",
       "pclass = passenger class (1 = 1st; 2 = 2nd; 3 = 3rd)\n",
       "survived = Survival (False = No; True = Yes)\n",
       "name = Name\n",
       "sex = Sex\n",
       "sibsp = Number of Siblings/Spouses Aboard\n",
       "parch = Number of Parents/Children Aboard\n",
       "ticket = Ticket Number\n",
       "fare = Passenger Fare\n",
       "cabin = Cabin\n",
       "embarked = Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)\n",
       "boat = Lifeboat (if survived)\n",
       "body = Body number (if did not survive and body was recovered)\n",
       "home_dest = Passenger destination\n",
       "</div><h2>Columns:</h2><table class='table-striped'><thead><tr><th>column</th><th>type</th><th>unit</th><th>description</th><th>expression</th></tr></thead><tr><td>pclass</td><td>int64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>survived</td><td>bool</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>name</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>sex</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>age</td><td>float64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>sibsp</td><td>int64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>parch</td><td>int64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>ticket</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>fare</td><td>float64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>cabin</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>embarked</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>boat</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>body</td><td>float64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>home_dest</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr></table><h2>Data:</h2><table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>pclass  </th><th>survived  </th><th>name                                           </th><th>sex   </th><th>age   </th><th>sibsp  </th><th>parch  </th><th>ticket  </th><th>fare    </th><th>cabin  </th><th>embarked  </th><th>boat  </th><th>body  </th><th>home_dest                      </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>1       </td><td>True      </td><td>Allen, Miss. Elisabeth Walton                  </td><td>female</td><td>29.0  </td><td>0      </td><td>0      </td><td>24160   </td><td>211.3375</td><td>B5     </td><td>S         </td><td>2     </td><td>nan   </td><td>St Louis, MO                   </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>1       </td><td>True      </td><td>Allison, Master. Hudson Trevor                 </td><td>male  </td><td>0.9167</td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>11    </td><td>nan   </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>1       </td><td>False     </td><td>Allison, Miss. Helen Loraine                   </td><td>female</td><td>2.0   </td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>None  </td><td>nan   </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>1       </td><td>False     </td><td>Allison, Mr. Hudson Joshua Creighton           </td><td>male  </td><td>30.0  </td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>None  </td><td>135.0 </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>1       </td><td>False     </td><td>Allison, Mrs. Hudson J C (Bessie Waldo Daniels)</td><td>female</td><td>25.0  </td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>None  </td><td>nan   </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td>...                              </td><td>...     </td><td>...       </td><td>...                                            </td><td>...   </td><td>...   </td><td>...    </td><td>...    </td><td>...     </td><td>...     </td><td>...    </td><td>...       </td><td>...   </td><td>...   </td><td>...                            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,304</i></td><td>3       </td><td>False     </td><td>Zabour, Miss. Hileni                           </td><td>female</td><td>14.5  </td><td>1      </td><td>0      </td><td>2665    </td><td>14.4542 </td><td>None   </td><td>C         </td><td>None  </td><td>328.0 </td><td>None                           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,305</i></td><td>3       </td><td>False     </td><td>Zabour, Miss. Thamine                          </td><td>female</td><td>nan   </td><td>1      </td><td>0      </td><td>2665    </td><td>14.4542 </td><td>None   </td><td>C         </td><td>None  </td><td>nan   </td><td>None                           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,306</i></td><td>3       </td><td>False     </td><td>Zakarian, Mr. Mapriededer                      </td><td>male  </td><td>26.5  </td><td>0      </td><td>0      </td><td>2656    </td><td>7.225   </td><td>None   </td><td>C         </td><td>None  </td><td>304.0 </td><td>None                           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,307</i></td><td>3       </td><td>False     </td><td>Zakarian, Mr. Ortin                            </td><td>male  </td><td>27.0  </td><td>0      </td><td>0      </td><td>2670    </td><td>7.225   </td><td>None   </td><td>C         </td><td>None  </td><td>nan   </td><td>None                           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,308</i></td><td>3       </td><td>False     </td><td>Zimmerman, Mr. Leo                             </td><td>male  </td><td>29.0  </td><td>0      </td><td>0      </td><td>315082  </td><td>7.875   </td><td>None   </td><td>S         </td><td>None  </td><td>nan   </td><td>None                           </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Load the titanic dataset\n",
    "df = vaex.ml.datasets.load_titanic()\n",
    "\n",
    "# See the description\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Shuffling\n",
    "From the preview of the DataFrame we notice that the data is sorted alphabetically by name and by passenger class.\n",
    "Thus we need to shuffle it before we split it into train and test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.078118Z",
     "start_time": "2020-05-01T17:12:37.072165Z"
    }
   },
   "outputs": [],
   "source": [
    "# The dataset is ordered, so let's shuffle it\n",
    "df = df.sample(frac=1, random_state=31)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Shuffling for large datasets\n",
    "As mentioned in [The ML introduction tutorial](tutorial_ml_intro.ipynb), shuffling large datasets in-memory is not a good idea. In case you work with a large dataset, consider shuffling while exporting:\n",
    "\n",
    "```\n",
    "df.export(\"shuffled\", shuffle=True)\n",
    "df = vaex.open(\"shuffled.hdf5)\n",
    "df_train, df_test = df.ml.train_test_split(test_size=0.2)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Split into train and test\n",
    "Once the data is shuffled, let's split it into train and test sets. The test set will comprise 20% of the data. Note that we do not shuffle the data for you, since vaex cannot assume your data fits into memory, you are responsible for either writing it in shuffled order on disk, or shuffle it in memory (the previous step)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.128176Z",
     "start_time": "2020-05-01T17:12:37.080094Z"
    }
   },
   "outputs": [],
   "source": [
    "# Train and test split, no shuffling occurs\n",
    "df_train, df_test = df.ml.train_test_split(test_size=0.2, verbose=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sanity checks\n",
    "\n",
    "Before we move on to process the data, let's verify that our train and test sets are \"similar\" enough. We will not be very rigorous here, but just look at basic statistics of some of the key features.\n",
    "\n",
    "For starters, let's check that the fraction of survivals is similar between the train and test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:37.731294Z",
     "start_time": "2020-05-01T17:12:37.129879Z"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1QAAAEUCAYAAAAspncYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3deZhkZXn///eHGQRkGBEYMKCAuKEouIzBJaJGI1+3SIBvBFExiWL0i8tPElzRcSEKCkbFCLgBLgRRMCoaxQhuUZJxARwEBQRlUQeQgRkWWe7fH+e0FEV3z+mmuqt6+v26rrqmzvOc5a6emXr6PudZUlVIkiRJkqZuvWEHIEmSJElzlQmVJEmSJE2TCZUkSZIkTZMJlSRJkiRNkwmVJEmSJE2TCZUkSZIkTZMJleaMJF9Lsv+w4xgFSfZL8o2O+65I8pRpXueSJE+fzrEdz9/5c0iSZt9Mtb1JjkvyrkGft+f82yZZnWTBTF1DGmNCpRnVfpmNvW5PcmPP9n5TOVdVPbOqjp+pWPsl2T5JJVk4W9fsqqo+U1XP6LjvTlV15gyHtFbj/Tyn8jk6nH+zJKcmWZPk0iQvWMv+OyT5SpLrk1yV5PDpnkvS3DPI9qk935lJXjoDcb4kyfcGfd6uZrvtna7+G4BV9euqWlRVtw3g3ElyWJKr29fhSTLBvvv1/du6oW37HjPVc2nuGLlfFLVuqapFY++TXAK8tKq+2b9fkoVVdetsxjbK5uLPI8mCQTRcd8OHgT8CWwGPBE5LcnZVrejfMck9gNPbY54P3AY8eDrnkjQ3dW2f5qv2l/xU1e0jEMuw28QDgD2AXYCiaT8uBo7u37GqPgN8Zmw7yUuAQ4AfT/Vcmjt8QqWhSPKUJJcleX2S3wKfTHLv9onByiR/aN/ft+eYP939G7tjl+R97b6/SvLMSa73+iSXt08jLkjytLZ8vSRvSHJRe6foc0k2aw/7Tvvnte1dpsd3+Fx/nmR5kuuS/C7Jkb2ft2/fP91NS7IsyeeTfDrJdcCb2rulm/Xs/6j2Scr6vXcskxyd5H195/6PJK8b5zqTfV6SvKh9InN1kjev5bMel+QjSb6aZA3w1CTPTvKT9vP/JsmynkPu8vPsv/Oa5AlJ/jfJqvbPJ6ztZ94etzGwF3BIVa2uqu8BXwJeNMEhLwGuqKojq2pNVd1UVedM81yS1iGTfU8m2bD9nr46ybXt99RWSQ4FngQc1X6/HTXOecc9tq27V5KPJ7mybavelWRBkofS/KL9+Pa813b8DC9JcnHb5v0q7RO3tq35dM9+d+o50Lazhyb5PnADsMNY25tkgzbuh/ccv6Rtq7Zst5+T5Kftfv+dZOeefR+V5MdtTCcBG64l/u8neX+Sa4BlSR6Q5Fvtz++qJJ9Jsmm7/6eAbYEvtz+ng8f5bFsn+VKSa5JcmORlXX6Wrf2BI6rqsqq6HDiCph3peuwJVVUDOJdGlAmVhuk+wGbAdjR3bNYDPtlubwvcCNylUeqxK3ABsAVwOPDx5K6PzZM8BDgQeGxVbQLsDlzSVr+a5k7Rk4GtgT/QPJ0A2K39c9O228AP0vTJvjbJthPE9AHgA1W1GHgA8LlJfwJ39jzg88CmwHuBH9D8Yj/mBcDnq+qWvuM+Czx/7LMnuTfwDODfx7nGhJ83ycOAj9AkDlsDmwP3HeccvV4AHApsAnwPWAO8uP0MzwZekWSPdt+7/Dx7T9T+wnIa8MH22kfSPBnavK1/Q5KvTBDHg4HbquoXPWVnAztNsP/jgEvSjA24qv2F4RHTPJekdctk7cL+wL2A+9F8T/0jcGNVvRn4LnBg+/124DjnHffYtu544FbggcCjaL7DX1pVP2/3+0F73rEE4gVJzhkv+Pam0AeBZ7Zt3hOAn07h87+Ipk3eBLh0rLCqbgZOAfbt2fdvgW9X1e+TPBr4BPDy9vMdA3ypTcTuAXwR+BRNu38yd27fxrMrzZObLWnamQDvpvk7eSjNz3FZG9uLgF8Dz21/ToePc74Tgcva4/cG/iV33Fz9i7UkqzvRtANjOrUJSbajaftOuLvn0mgzodIw3Q68rapurqobq+rqqvpCVd1QVdfTfIE+eZLjL62qj7bdzI4H/oymi1a/24ANgIclWb+qLqmqi9q6lwNvbu8U3Uzz5bx3Jhg31fbJ3rSqfj1BTLcAD0yyRft044dr+Rn0+kFVfbGqbq+qG2kSpX3hT10v9mnL+n2XptvAk9rtvdtzXTHOvpN93r2Br1TVd9q6Q2j+jibzH1X1/Tbmm6rqzKo6t90+h6YBm+zvsNezgV9W1aeq6taqOhE4H3guQFW9p6qeM8Gxi4BVfWWraH4hGM99aX6eH6RpXE8D/qNt9Kd6Lknrlsm+J2+hSRYeWFW3VdWPquq6jucd99j2KdUzgde2T8x/D7yf5jtqXFX12araeaJ6mu/uhyfZqKqunGJ35eOqakX7PTzeDbzehOoF3NEuvQw4pqrOaj/f8cDNNDewHgesD/xrVd1SVZ8H/nctcVxRVR9q47ixqi6sqtPb3xlW0tx069S+JLkf8BfA69u26qfAx2h7HlTV98aS1Qn0twurgEXj3cTt82Lgu1X1qwGcSyPMhErDtLKqbhrbSHLPJMek6XJ2HU0XsU0z8Qw9vx17U1U3tG8X9e9UVRcCr6VpFH+f5N+TbN1Wbwec2j51uhb4OU0CNl5i1sU/0DzhOL/tzjFRAjCe3/Rtf56mm8fWNHe4iiZ5upO2G8G/c0cj9wJ6+m/3mezzbt0bQ1WtAa6eSsxJdk1yRppum6to7qxusZZzjNmanruhrUuBbTocuxpY3Fe2GLh+gv1vBL5XVV+rqj8C76P5Reeh0ziXpHXLZN+TnwK+Dvx7kivSTCiwfsfzTnTsdjTJxpU91zyG5snMlLXf3c+n+f69MslpSXacwin626Je3wI2ar/rt6MZY3pqW7cdcNDYZ2g/x/1ovtu3Bi7v6fYGd/2+nzSOJFu27ffl7e8In2Zq7cs17c3a3ut3aV/gru3CYmB13+cZz4tpbvgO4lwaYSZUGqb+L4+DgIcAu7Zd5sa6iN3tuzbt3by/oPnCL+Cwtuo3NN0iNu15bdj2a57yl1tV/bKq9qVpCA8DPt92v1gD3HNsvzZJXNJ/eN+5rgW+QdOl4gXAiZN84Z5Icwd1O5puEl+YYL/JPu+VNI3fWIz3pEkyJv3IfdufpRlvdL+quhdN3/9MsG+/K2j+fnptC1y+luMAfgEsTPKgnrJdgInuyp4zSTxTPZekdcuE35Pt05W3V9XDaLrSPYfml2ZYy3fcJMf+huZJzhY911tcVWPdwKbTFn29qv6KpufG+cBH26o7tUU0Xe/vcvgk572dpiv7vjTt0ld6kpTfAIf2/dzu2fY2uBLYpu8pzERd5yeK491t2c7t7wgv5M6/H0z2c7oC2CxJb0+Dru0LNN//u/Rsr7VNSPJEmkTu83f3XBp9JlQaJZvQPDm4th1P87ZBnDTJQ5L8ZZINgJvaa4zNRnc0cGibiIwNsH1eW7eSptvEDlO41guTLGkbnbH+2LfR/JK+YZpJG9YH3kLTDXFtPkvT4O7F+N39AKiqn7Txfgz4epuMjWeyz/t54DltX/J7AO9g6t8Rm9DcBbwpyZ/TNLhj1vbz/Crw4HZswMIkzwceBkw0bupP2juypwDvSLJx25A9j+aO8Hg+DTwuydPb5Pa1wFXAz6dxLknrlgm/J5M8Nckj2u+N62i68Y21J79jkvZiomOr6kqam2dHJFmcZlKMByQZ6872O+C+7ffyWqWZJOOv25t5N9M8ERmL8afAbmnGA98LeGPHn0mvz9I8AduPO7dLHwX+sX16lfb789ltEvMDmjFir26/3/cE/nyK192k/SzXJtkG+Oe++gl//lX1G+C/gXenmRxkZ5oeJRP15uh3AvC6JNu0vUYOAo5byzH7A1/oeyo23XNpxJlQaZT8K7ARzS+2PwT+c0Dn3QB4T3ve39I8PXpTW/cBmicq30hyfXvdXeFP3QgPBb7fdl94XO5YKHCiO2v/B1iRZHV77n3a/tqrgFfSJDyX09wlvGyCc/T6EvAg4HdVdfZa9j0ReDqTJF5r+bwrgP/XHn8lzUDsLjH2eiVNInI98FZ6JuUY7+fZe2BVXU1zx/Ygmq6GBwPPqaqrAJK8KcnX1nLtjYDf0/wsXjE2bqD/762qLqC5u3l0+zmfB/x12/1v0nNJWudN+D1J80Tn8zQJ0c+Bb9PcoBk7bu80M89+cJzzTnbsi4F7AOfRfCd9nubpEjTd7FYAv00y9n24X5KJvpPWo/kevQK4hmac0SsBqup04CSap/Q/osMNq35VdRZNG7Y18LWe8uU046iOaj/DhbSz17XfrXu223+gSchOmeKl3w48mmbM0WnjHP9u4C1t+/JP4xy/L7A9zc/lVJox3KcDJHlS225P5Bjgy8C5wM/a6x8zVplkRXrWLkuyIU3vkvHW75r0XJqbYpdNSZIkSZoen1BJkiRJ0jSZUEmSJEnSNJlQSZIkSdI0mVBJkiRJ0jSZUEmSJEnSNC0cdgAzaYsttqjtt99+2GFIkgbkRz/60VVV1b8o9kiyDZKkdcdk7c86nVBtv/32LF++fNhhSJIGJMmlw46hK9sgSVp3TNb+2OVPkiRJkqbJhEqSJEmSpsmESpIkSZKmyYRKkiRJkqbJhEqSJEmSpsmESpIkSZKmyYRKkiRJkqbJhEqSJEmSpmmdXth3Ltv+DacNO4R555L3PHvYIUjS0Nn+DIdtkDR3+YRKkrTOS3JgkuVJbk5yXE/59kkqyeqe1yE99UlyWJKr29fhSTKUDyFJGkk+oZIkzQdXAO8Cdgc2Gqd+06q6dZzyA4A9gF2AAk4HLgaOnqE4JUlzjE+oJEnrvKo6paq+CFw9xUP3B46oqsuq6nLgCOAlg45PkjR3mVBJkgSXJrksySeTbNFTvhNwds/22W2ZJEmACZUkaX67CngssB3wGGAT4DM99YuAVT3bq4BFE42jSnJAO1Zr+cqVK2coZEnSKDGhkiTNW1W1uqqWV9WtVfU74EDgGUkWt7usBhb3HLIYWF1VNcH5jq2qpVW1dMmSJTMbvCRpJJhQSZJ0h7FEaewJ1AqaCSnG7NKWSZIEmFBJkuaBJAuTbAgsABYk2bAt2zXJQ5Ksl2Rz4IPAmVU11s3vBOB1SbZJsjVwEHDcUD6EJGkkmVBJkuaDtwA3Am8AXti+fwuwA/CfwPXAz4CbgX17jjsG+DJwblt/WlsmSRLgOlSSpHmgqpYByyaoPnGS4wo4uH1JknQXPqGSJEmSpGkyoZIkSZKkaZr1hCrJPkl+nmRNkouSPKktf1qS85PckOSMJNv1HJMkhyW5un0dPtEaIJIkSZI0W2Y1oUryV8BhwN/RLJ64G3Bxuyr9KcAhwGbAcuCknkMPAPagma52Z+A5wMtnL3JJkiRJuqvZfkL1duAdVfXDqrq9qi6vqsuBPYEVVXVyVd1EM3B4lyQ7tsftDxxRVZe1+x8BvGSWY5ckSZKkO+mUUCVZkmRJz/Yjkrwryb6THdd3jgXAUmBJkguTXJbkqCQbATsBZ4/tW1VrgIvacvrr2/c7MY4kByRZnmT5ypUru4YnSZIkSVPW9QnV54DnArTd874D/A1wdJKDOp5jK2B9YG/gScAjgUfRrAOyCFjVt/8qmm6BjFO/Clg03jiqqjq2qpZW1dIlS5b0V0uSJEnSwHRNqHYGfti+3xu4sKp2Al5M97FMN7Z/fqiqrqyqq4AjgWcBq4HFffsvpllokXHqFwOr2/VBJEmSJGkouiZUG9EkNQBPB77Uvv8xcL8uJ6iqPwCXAeMlQStoJpwAIMnGwAPa8rvUt+9XIEmSJElD1DWh+iWwZ5L7Ac8AvtGWbwVcO4XrfRJ4VZItk9wbeC3wFeBU4OFJ9kqyIfBW4JyqOr897gTgdUm2SbI1cBBw3BSuK0mSJEkD1zWhejvNdOeXAD+sqrPa8t2Bn0zheu8E/hf4BfDz9thDq2olsBdwKPAHYFdgn57jjgG+DJwL/Aw4rS2TJEmSpKFZ2GWnqjolybbA1tx5tr1vAl/oerGqugV4Zfvqr/smsONdDmrqCji4fUmSJEnSSFjrE6ok6yf5LbBFVf2kqm4fq6uqs3q65UmSJEnSvLLWhKp9qnQL408mIUmSJEnzVtcxVB8C3pikUxdBSZIkSZoPuiZITwKeDFye5GfAmt7KqvrrQQcmSZIkSaOua0J1FVOYfEKSJEmS5oOus/z93UwHIkmSJElzTdcxVAAkWZrk+Uk2brc3dlyVJEmSpPmqUzKUZCvgS8BjaWb7exBwMXAkcBPwmpkKUJIkSZJGVdcnVO8HfgtsDtzQU34y8IxBByVJkiRJc0HX7npPA55WVX9I0lt+EbDtwKOSJEmSpDmg6xOqjYA/jlO+hKbLnyRJkiTNO10Tqu8AL+nZriQLgNcD/zXooCRJGqQkByZZnuTmJMf1lD8uyelJrkmyMsnJSf6sp35ZkluSrO557TCUDyFJGkldE6qDgZclOR3YADgCOA94IvDGGYpNkqRBuQJ4F/CJvvJ7A8cC2wPbAdcDn+zb56SqWtTzunimg5UkzR1d16E6L8kjgFcANwMb0kxI8eGqunIG45Mk6W6rqlOgWf4DuG9P+dd690tyFPDt2Y1OkjSXdV5Dqqp+C7xtBmORJGnYdgNW9JU9N8k1wJXAUVX1kYkOTnIAcADAtts6Z5MkzQdd16HabYKqopmU4qKqumZgUUmSNMuS7Ay8FXheT/HnaLoE/g7YFfhCkmur6sTxzlFVx7b7s3Tp0prZiCVJo6DrE6ozaZIngLF503u3b0/yJeBFVbVmcOFJkjTzkjwQ+Brwmqr67lh5VZ3Xs9t/J/kAsDcwbkIlSZp/uk5K8Wzg58ALgQe2rxfSdIvYq309EnjPDMQoSdKMSbId8E3gnVX1qbXsXtxxY1GSpM5PqN5Fc9eud4r0i5OsBA6rqsckuQ34EPCqQQcpSdLdkWQhTZu3AFiQZEPgVmAr4Fs0kywdPc5xz6NZOuRa4LHAq4E3zVbckqTR1zWhehhw+Tjll7d1AOcC9xlEUJIkDdhbuPPESi8E3k7zxGkH4G1J/lRfVYvat/vQTLW+AXAZzU3E42clYknSnNC1y995wJuTbDBW0L5/U1sHcD/gt5OdJMmZSW7qWRzxgp66pyU5P8kNSc5ou2CM1SXJYUmubl+HJ7HLhSSpk6paVlXpey2rqre373vXmVrUc9y+VbV5W75jVX1wmJ9DkjR6uiZUrwR2By5vk6IzaJ5O7U6zNhU0d/j+rcO5DuxptB4CkGQL4BTgEGAzYDlwUs8xBwB7ALsAOwPPAV7eMXZJkiRJmhFdF/Y9K8n9abpIPIRmQO6JwGfGZvWrqhPuRhx7Aiuq6mSAJMuAq5LsWFXnA/sDR1TVZW39EcDLgLv0d5ckSZKk2TKVhX3XAMcM4JrvTvIe4ALgzVV1JrATcHbvtZJc1Jaf31/fvt9pALFIkiRJ0rR1TqiS3A94ErAlfV0Fq+rIjqd5Pc2Yqz/SDPT9cpJHAouAlX37rgI2ad8vard76xYlSVXdaeFEV6mXJEmSNFs6JVRJ9qOZ5ehWmsSnN4kpoFNCVVVn9Wwen2Rf4FnAamBx3+6Lgevb9/31i4HV/clUew1XqZckSZI0K7pOSvEO4AhgcVVtX1X373ntcDeuP7ZA4gqaCScASLIx8IC2nP769v0KJEmSJGmIuiZUWwEfq6rbpnuhJJsm2T3JhkkWtk+9dgO+DpwKPDzJXu1ii28FzmknpAA4AXhdkm2SbA0cBBw33VgkSZIkaRC6jqH6KrArcPHduNb6wLuAHYHbaCab2KOqLgBIshdwFPBp4CyaMVZjjqGZlv3cdvtjDGaCDEmSJEmatq4J1enAYUl2oklqbumtrKpT1naCqloJPHaS+m/SJFvj1RVwcPuSJEmSpJHQNaEaexr0pnHqClgwmHAkSZIkae7ourBv17FWkiRJkjRvmChJkiRJ0jR1SqjSeGWSFUluSLJDW/6GJH87syFKkiRJ0mjq+oTqNcBbaBbMTU/55cCBgw5KkiRJkuaCrgnVPwIvq6oPALf2lP8Y2GngUUmSJEnSHNB1lr/tgJ+NU34LsNHgwpEkSZJm3vZvOG3YIcxLl7zn2cMOYeC6PqG6GHj0OOXPAs4bXDiSJEmSNHd0fUL1PuCoJPekGUP1+CQvollo9+9nKjhJkiRJGmVd16H6ZJKFwL8A9wQ+RTMhxaur6qQZjE+SJEmSRlbXJ1RU1UeBjybZAlivqn4/c2FJkiRJ0ujrug7VeknWA6iqq4D1krw0yRNmNDpJkiRJGmFdJ6U4DXgVQJJFwHLgvcC3k7x4hmKTJEmSpJHWNaF6DPCt9v2ewHXAlsDLgH+agbgkSRqYJAcmWZ7k5iTH9dU9Lcn5SW5IckaS7XrqkuSwJFe3r8OT5C4XkCTNW10Tqk2Aa9v3zwBOrapbaJKsB8xEYJIkDdAVwLuAT/QWtuOCTwEOATaj6YHRO9nSAcAewC7AzsBzgJfPQrySpDmia0L1a+CJSTYGdgdOb8s3A26YicAkSRqUqjqlqr4IXN1XtSewoqpOrqqbgGXALkl2bOv3B46oqsuq6nLgCOAlsxS2JGkO6JpQHUkzVfplNNOlf6ct3w04dwbikiRpNuwEnD22UVVrgIva8rvUt+93YgJJDmi7Fi5fuXLlDIQrSRo1nRKqqjoGeDzNIr5/UVW3t1UX0XSTkCRpLloErOorW0XT1X28+lXAoonGUVXVsVW1tKqWLlmyZODBSpJGz1TWoVpO07ccgCTrV9VpMxKVJEmzYzWwuK9sMXD9BPWLgdVVVbMQmyRpDui6DtWrk+zVs/1x4MYkFyR5yIxFJ0nSzFpBM+EEAO1Y4Qe05Xepb9+vQJKkVtcxVK8GVgIk2Q34W+AFwE9pBuhKkjSykixMsiGwAFiQZMMkC4FTgYcn2autfytwTlWd3x56AvC6JNsk2Ro4CDhuCB9BkjSiuiZU2wCXtO+fC5xcVZ+jmQ3pcVO9aJIHJbkpyad7ylwHRJI0U94C3Ai8AXhh+/4tVbUS2As4FPgDsCuwT89xxwBfppmA6Wc0C90fM3thS5JGXdeE6jpgbHTtXwH/1b6/BdhwGtf9MPC/YxuuAyJJmklVtayq0vda1tZ9s6p2rKqNquopVXVJz3FVVQdX1Wbt62DHT0mSenVNqL4BfLQdO/VA4Gtt+U7Ar6ZywST70CwS/F89xa4DIkmSJGnO6ZpQ/T/g+8AWwN5VdU1b/mjgxK4XS7IYeAdNH/ReA1sHxDVAJEmSJM2WTtOmV9V1wKvGKX/bFK/3TuDjVfWbviFQi2gnvejRaR2Q/q4XVXUscCzA0qVL7ZYhSZIkacZ0XodqTJL7APfoLauqX3c47pHA04FHjVPtOiCSJEmS5pxOCVWSewEfpJku/R7j7LKgw2meAmwP/Lp9OrWIZurahwFH04yTGrveROuA/E+77Tog0jpi+ze4Pvhsu+Q9zx52CJIkrTO6jqF6H00SswdwE80aVP8MXAY8v+M5jqVJkh7Zvo6mmX52d1wHRJIkSdIc1LXL3zOBfavqu0luA35UVScluZJm+vLPr+0EVXUDcMPYdpLVwE3tGiAk2Qs4Cvg0cBZ3XQdkB5p1QAA+huuASJIkSRqyrgnVpsCl7ftVwObAhcAPaJKbKRtb/6Nn+5vAjhPsW8DB7UuSJEmSRkLXLn8X0TwhAvg5sE+agVB7AtdMeJQkSZIkrcO6JlTHATu3799D083vj8B7gcMGH5YkSZIkjb6u61C9v+f9t5I8FHgM8MuqOnfiIyVJkiRp3TXldagAqupS7hhTJUmSJEnzUtcufyTZI8l3klzVvr6b5G9mMjhJkiRJGmWdEqokBwEnARdwx2x75wOfTfJPMxeeJEmSJI2url3+/gk4sKo+2lP2iST/A7yDZuFfSZIkSZpXunb5WwScMU75GW2dJEmSJM07XROqLwJ7j1O+F/ClwYUjSZIkSXNH1y5/FwJvSPJU4Adt2ePa15FJXje2Y1UdOdgQJUmSJGk0dU2oXgL8AXhw+xrzB+DverYLMKGSJEmSNC90Xdj3/jMdiCRJkiTNNZ3XoZIkSZIk3ZkJlSRJkiRNkwmVJGleS7K673Vbkg+1ddsnqb76Q4YdsyRpdHSdlEKSpHVSVf1pPcUkGwO/A07u223Tqrp1VgOTJM0JEz6hSvKJJJu073dLYvIlSVrX7Q38HvjusAORJM0Nk3X5eyGwcfv+DGCzmQ9HkqSh2h84oaqqr/zSJJcl+WSSLYYRmCRpNE321OkS4FVJvgEEeHySP4y3Y1V9ZwZikyRp1iTZFngy8A89xVcBjwV+CmwOfBj4DLD7BOc4ADgAYNttt53JcCVJI2KyhOqfgY8Cb6RZsPfUCfYrYMGA45Ikaba9GPheVf1qrKCqVgPL283fJTkQuDLJ4qq6rv8EVXUscCzA0qVL+59ySZLWQRN2+auq/6iqLWm6+gXYCVgyzmvLrhdL8ukkVya5Lskvkry0p+5pSc5PckOSM5Js11OXJIclubp9HZ4kU/60kiRN7MXA8WvZZyxJsg2SJAEdZvmrqmuTPBX45QBmOHo38A9VdXOSHYEzk/wEuBQ4BXgp8GXgncBJwOPa4w4A9gB2oWnMTgcuBo6+m/FIkkSSJwDb0De7X5JdgWuBXwL3Bj4InFlVq2Y9SEnSSOo0c19VfTvJBkleDDyMJqk5D/hsVd3c9WJVtaJ3s309AHgMsKKqTgZIsgy4KsmOVXU+zSDhI6rqsrb+COBlmFBJkgZjf+CUqrq+r3wH4F9oemNcR3NDb99Zjk2SNMI6Leyb5GHAL4AjgV1pnhy9H/hFkodO5YJJ/i3JDcD5wJXAV2m6E549tk9VrQEuasvpr2/f78Q4khyQZHmS5StXrpxKaJKkeaqqXl5VLxqn/MSqun9VbVxVf1ZVL66q3w4jRknSaOqUUAEfoJnhaNuqelJVPQnYliax+depXLCqXglsAjyJppvfzcAioL/7xKp2P8apXwUsGm8cVVUdW1VLq2rpkiVLphKaJEmSJE1J18V6nwg8tndGoy8Oq9wAABDwSURBVKq6LsmbgR9O9aJVdRvwvSQvBF4BrAYW9+22GBjretFfvxhYPc46IZIkSZI0a7o+oboJ2HSc8nu1ddO1kGYM1QqaCScASLJxTzn99e373vFYkiRJkjTruiZUXwY+muSJSRa0r78AjgG+1OUESbZMsk+SRe3xu9MM7P0WzRpXD0+yV5INgbcC57QTUgCcALwuyTZJtgYOAo7r/CklSZIkaQZ07fL3Gpq1Ob4L3NaWrUeTTL224zmKpnvf0e2xlwKvrar/AEiyF3AU8GngLGCfnmOPoZlp6dx2+2NtmSRJkiQNTddp068FnpfkgcBDaRY0PK+qLux6oapaCTx5kvpvAjtOUFfAwe1LkiRJkkZC1ydUALQJVOckSpIkSZLWZV3HUEmSJEmS+phQSZIkSdI0mVBJkiRJ0jStNaFKsjDJK9vpyiVJkiRJrbUmVFV1K/BeYP2ZD0eSJEmS5o6uXf5+CDx6JgORJEmSpLmm67TpHwWOSLId8CNgTW9lVf140IFJkiRJ0qjrmlB9tv3zyHHqClgwmHAkSZIkae7omlDdf0ajkCRJkqQ5qFNCVVWXznQgkiRJkjTXdF6HKskzk3wlyXlJ7teWvTTJ02YuPEmSJEkaXZ0SqiT7AZ8DfknT/W9sCvUFwMEzE5okSZIkjbauT6gOBl5WVf8fcGtP+Q+BRw48KkmSJEmaA7omVA8CfjBO+Wpg8eDCkSRJkqS5o2tCdQXw4HHKdwMuGlw4kiRJkjR3dE2ojgU+mOSJ7fb9kuwPHA58ZEYikyRpliQ5M8lNSVa3rwt66p6W5PwkNyQ5o13kXpIkoGNCVVWHA6cApwMbA2cARwNHV9WHZy48SZJmzYFVtah9PQQgyRY07d8hwGbAcuCkIcYoSRoxXRf2parenORQ4GE0idh5VbV6xiKTJGn49gRWVNXJAEmWAVcl2bGqzh9qZJKkkdB5HapWATcBNwC3DT4cSZKG5t1Jrkry/SRPact2As4e26Gq1tCMHd5pCPFJkkZQ13WoNkjyr8A1NA3LOcA1ST6QZMMpnOPjSS5Ncn2SnyR5Zk/9hH3U0zgsydXt6/AkmdpHlSRpQq8HdgC2oRk3/OUkDwAWAav69l0FbDLeSZIckGR5kuUrV66cyXglSSOi6xOqjwB7Ay+lmUL9ge37vwH+reM5FgK/AZ4M3IumP/rnkmzfoY/6AcAewC7AzsBzgJd3vK4kSZOqqrOq6vqqurmqjge+DzyL8ZcHWQxcP8F5jq2qpVW1dMmSJTMbtCRpJHQdQ/V/gT2r6vSesouT/B74AvD3aztB201iWU/RV5L8CngMsDmT91HfHziiqi5r648AXkYzMYYkSYNWQIAVNG0QAEk2Bh7QlkuS1PkJ1Rrg8nHKLwdunM6Fk2xFs7bVCtbeR/1O9e17+69Lku62JJsm2T3JhkkWJtmPZp3FrwOnAg9Pslfbxf2twDlOSCFJGtM1ofoQ8LYkG40VtO8PaeumJMn6wGeA49tGaW191PvrVwGLxhtHZf91SdIUrQ+8C1gJXAW8Ctijqi6oqpXAXsChwB+AXYF9hhWoJGn0TNjlL8mX+oqeAlye5Jx2+xHt8RtP5YJJ1gM+BfwROLAtXlsf9f76xcDqqqr+81fVsTQDilm6dOld6iVJ6tUmTY+dpP6bwI6zF5EkaS6ZbAzV1X3bX+jb/tVUL9Y+Ufo4sBXwrKq6pa1aWx/1FTQTUvxPu70L9l+XJEmSNGQTJlRV9XczcL2PAA8Fnl5VvWOvTgXem2Qv4DTu2kf9BOB1Sb5KM1D4IKbR1VCSJEmSBmmqC/tOW7uu1MuBRwK/TbK6fe3XoY/6McCXgXOBn9EkXcfMVuySJEmSNJ5O06YnuTfNlOdPBbakLxGrqi3Xdo6qupRmCtqJ6ifso96OlTq4fUmSJEnSSOi6DtUJNNOUHw/8jqbbnSRJkiTNa10TqqcAT66qH89gLJIkSZI0p3QdQ3XRFPaVJEmSpHmha5L0GuDdSXZJsmAmA5IkSZKkuaJrl78LgY2AHwM0y0ndoapMsiRJkiTNO10TqhOBewGvxkkpJEmSJAnonlAtBf68qn42k8FIkiRJ0lzSdQzVecDimQxEkiRJkuaargnVW4Ajkzw9yVZJNut9zWSAkiRJkjSqunb5+2r75ze48/iptNtOSiFJkiRp3umaUD11RqOQJEmSpDmoU0JVVd+e6UAkSZIkaa7plFAlefRk9VX148GEI0mSJElzR9cuf8tpxkr1rujbO5bKMVSSJEmS5p2uCdX9+7bXBx4FvBl440AjkiRJkqQ5ousYqkvHKb4wySrgbcDXBhqVJEmSJM0BXdehmsivgEcOIhBJkiRJmmu6TkrRv3hvgD8DlgEXDDgmSZIkSZoTuo6huoo7T0IBTVL1G+D5A41IkiRJkuaI6S7sezuwEriwqm4dbEiSJM2eJBsA/wY8HdgMuBB4U1V9Lcn2NN3b1/QcclhVvXO245QkjSYX9pUkzXcLaXpcPBn4NfAs4HNJHtGzz6beQJQkjWfSSSmSbNbl1fViSQ5MsjzJzUmO66t7WpLzk9yQ5Iwk2/XUJclhSa5uX4cnyV0uIEnSFFXVmqpaVlWXVNXtVfUVmqdSjxl2bJKk0be2Wf6uounaN9nr91O43hXAu4BP9BYm2QI4BTiEprvFcuCknl0OAPYAdgF2Bp4DvHwK15UkqZMkWwEPBlb0FF+a5LIkn2zbrImOPaC9cbh85cqVMx6rJGn41tblr3/sVK//A7wG6NwFoqpOAUiyFLhvT9WewIqqOrmtXwZclWTHqjof2B84oqoua+uPAF4GHN312pIkrU2S9YHPAMdX1flJFgGPBX4KbA58uK3ffbzjq+pY4FiApUuX9k/mJElaB02aUI03dirJo4HDgN2AY4BBDMzdCTi757prklzUlp/fX9++32m8EyU5gOaJFttuu+0AQpMkzQdJ1gM+BfwROBCgqlbT9JoA+F2SA4ErkyyuquuGE6kkaZR0Xtg3yf2TfBY4C7gGeFhVvbqqBtGnYRGwqq9sFbDJBPWrgEXjjaOqqmOramlVLV2yZMkAQpMkreva9uTjwFbAXlV1ywS7jj11chyvJAnokFAl2TzJB2ieFN0HeHxVPb+qLhpgHKuBxX1li4HrJ6hfDKyuKrtTSJIG4SPAQ4HnVtWNY4VJdk3ykCTrJdkc+CBwZlX13wSUJM1Ta5vl703ARTRTyT6vqv6yqpZPdsw0raCZcGLsuhsDD+COAcF3qm/f9w4WliRpWtpZZV8OPBL4bZLV7Ws/YAfgP2lu8P0MuBnYd2jBSpJGztompXgXcCNwGfDKJK8cb6eq+usuF0uysL3mAmBBkg1pJrU4FXhvkr2A04C3Aue0E1IAnAC8LslXabpbHAR8qMs1JUmaTFVdyuRd+E6crVgkSXPP2hKqE7ijv/ggvAV4W8/2C4G3V9WyNpk6Cvg0zTitfXr2O4bmLuG57fbH2jJJkiRJGpq1zfL3kkFerKqWAcsmqPsmsOMEdQUc3L4kSZIkaSR0nuVPkiRJknRnJlSSJEmSNE0mVJIkSZI0TSZUkiRJkjRNJlSSJEmSNE0mVJIkSZI0TSZUkiRJkjRNJlSSJEmSNE0mVJIkSZI0TSZUkiRJkjRNJlSSJEmSNE0mVJIkSZI0TSZUkiRJkjRNJlSSJEmSNE0mVJIkSZI0TSZUkiRJkjRNJlSSJEmSNE0mVJIkSZI0TSZUkiRJkjRNJlSSJEmSNE1zJqFKslmSU5OsSXJpkhcMOyZJ0vxgGyRJmsjCYQcwBR8G/ghsBTwSOC3J2VW1YrhhSZLmAdsgSdK45sQTqiQbA3sBh1TV6qr6HvAl4EXDjUyStK6zDZIkTWZOJFTAg4HbquoXPWVnAzsNKR5J0vxhGyRJmtBc6fK3CFjVV7YK2KR/xyQHAAe0m6uTXDDDsenOtgCuGnYQ05HDhh2B5hj/rQ/HdkO4pm3Q3OH/S80X/luffRO2P3MloVoNLO4rWwxc379jVR0LHDsbQemukiyvqqXDjkOaaf5bn1dsg+YI/19qvvDf+miZK13+fgEsTPKgnrJdAAcDS5Jmmm2QJGlCcyKhqqo1wCnAO5JsnOSJwPOATw03MknSus42SJI0mTmRULVeCWwE/B44EXiF09WOJLu6aL7w3/r8Yhs0N/j/UvOF/9ZHSKpq2DFIkiRJ0pw0l55QSZIkSdJIMaGSJEmSpGkyoZIkSZKkaTKhkqQOkmyQ5NAkFydZ1ZY9I8mBw45NkrTusv0ZfSZUGogkD01ySJIPt9s7Jtl52HFJA/R+4OHAfsDYbD4rgFcMLSJJgG2Q1nm2PyPOhEp3W5L/C3wb2AZ4UVu8CDhyaEFJg/c3wAuq6gfA7QBVdTnNv3tJQ2IbpHnA9mfEmVBpEN4BPKOq/hG4rS07G9hleCFJA/dHYGFvQZIlwNXDCUdSyzZI6zrbnxFnQqVB2JKm8YI7HkVXz3tpXXAycHyS+wMk+TPgKODfhxqVJNsgretsf0acCZUG4Ufc0c1izD7A/wwhFmmmvAm4BDgX2BT4JXAF8PYhxiTJNkjrPtufEZcqb+Do7kmyI/AN4FfA44AzgQfTdMH45RBDk2ZE29XiqvILVBo62yDNJ7Y/o8mESgOR5J7Ac4DtgN8AX6mq1cONShqcJDtMVFdVF89mLJLuzDZI6zLbn9FnQqWBa//j31ZVlw47FmlQktxOMyYjPcUFUFULhhKUpLuwDdK6xvZn9DmGSndbkhOTPKF9/3c0ayOcl+QfhhuZNDhVtV5VLWj/XA/YGjiWu47dkDSLbIO0rrP9GX0+odLdluT3wH2r6o9JzgX+EbgW+GJVPWi40UkzJ8kGwC+qarthxyLNV7ZBmo9sf0bLwrXvIq3VPdqGbBtgs6r6PkCSrYYclzTTHgLcc9hBSPOcbZDmI9ufEWJCpUH4aZI30gwGPg2gbdiuG2pU0gAl+S53XtfmnsBONIuKShoe2yCt02x/Rp8JlQbhH4B3ArcA/9yWPR74zNAikgbvY33ba4CznZZZGjrbIK3rbH9GnGOoJGktkiwAPgEcUFU3DzseSdL8YPszN5hQaVqS/H2X/arqEzMdizQbklwJbFtVtww7Fmm+sw3SfGL7M/pMqDQtSc7osFtV1V/OeDDSLEhyMLAp8DYbNWm4bIM0n9j+jD4TKkmaRJJ9q+rEJL8B7gPcBqykZ4BwVW07rPgkSesm25+5w4RKA5Uk9KzkXVW3DzEc6W5Lcl1VLU7y5In2qapvz2ZMksZnG6R1ie3P3GFCpbutnZ72KGA3mkfSf1JVC4YSlDQgSa6vqk2GHYek8dkGaV1l+zN3OG26BuFo4AbgacC3aRq1ZcBXhxiTNCgLkjyVnrve/arqW7MYj6Q7sw3Susr2Z47wCZXutiRX08w+sybJtVW1aZLNgP+uqh2HHZ90dyS5DbiUiRu0qqodZjEkST1sg7Susv2ZO3xCpUG4Dbi1fX9tkiU0K9RvM7yQpIFZY4MljTTbIK2rbH/miPWGHYDmriT3ad+eBTyrff914CTgFGD5MOKSJK37bIMkjQq7/Gnaemaf2ZQmOf8YsB/wT8Ai4F+r6sphxijdXQ4KlkaTbZDWdbY/c4cJlaat/z96kmuqarNhxiRJmh9sgySNCrv86e4wG5ckDYttkKSR4KQUujsW9k3n2b/tdJ6SpJliGyRpJNjlT9OW5BImv0PodJ6SpBlhGyRpVJhQSZIkSdI0OYZKkiRJkqbJhEqSJEmSpsmESpIkSZKmyYRKkiRJkqbJhEqSJEmSpun/B23eD7xBAH87AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 864x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Inspect the target variable\n",
    "train_survived_value_counts = df_train.survived.value_counts()\n",
    "test_survived_value_counts = df_test.survived.value_counts()\n",
    "\n",
    "\n",
    "plt.figure(figsize=(12, 4))\n",
    "\n",
    "plt.subplot(121)\n",
    "train_survived_value_counts.plot.bar()\n",
    "train_sex_ratio = train_survived_value_counts[True]/train_survived_value_counts[False]\n",
    "plt.title(f'Train set: survivied ratio: {train_sex_ratio:.2f}')\n",
    "plt.ylabel('Number of passengers')\n",
    "\n",
    "plt.subplot(122)\n",
    "test_survived_value_counts.plot.bar()\n",
    "test_sex_ratio = test_survived_value_counts[True]/test_survived_value_counts[False]\n",
    "plt.title(f'Test set: surived ratio: {test_sex_ratio:.2f}')\n",
    "\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next up, let's check whether the ratio of male to female passengers is not too dissimilar between the two sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.073343Z",
     "start_time": "2020-05-01T17:12:37.733604Z"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1QAAAEUCAYAAAAspncYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3debhkVXnv8e+PIYA0DQItEZTJCUXFoRUnQMVgNBoJeBUn0AQh8eJwxSgqIiqoqGgUVAYHQIUgCokIDhjn2VaD2AjKKLPN1NDM4Hv/2PtAUZxhn9PndNXp8/08z35611p7eKu6u95ae629dqoKSZIkSdLkrTLoACRJkiRptrJBJUmSJElTZINKkiRJkqbIBpUkSZIkTZENKkmSJEmaIhtUkiRJkjRFNqg0o5J8I8keg45jRUnyzCSXDjiGf0tyVZJlSTZYged9dZIfr6jzjXL+dyT5zKDOL0mDkOTAJF8c4PmT5PNJrkvyyxV87mOSHLQiz9l3/jn1G0djs0Gl+2h/iI8sf01yS8/rV0zmWFX1vKo6dqZi7Zdk8ySVZLUVdc5hkmR14KPATlU1r6quGXRMM2G0hmtVvb+q9pym478vyVlJ7kxy4ATbrpHkiLYRe22SU5NsMpVjSbrHdOai9njfTzIt3xF9xx3oxaQh8Azg74AHVdWTBx3MTBmt4Tpdv3GS/E2SryS5qP0N88wJtl/Wt9yV5LCe+j2TnNfWfTPJxssbo8Zng0r30f4Qn1dV84A/Ay/sKfvSyHZztdEy5DYC1gQWDzqQqWqvdg76u+k84K3AaR22fSPwVOCxwMbA9cBhPfWTOZakVtdcpIHbDLioqm4adCBTNSS/Z34MvBK4cqIN+/5vbATcApwEkGQH4P3Ai4D1gQuBE2YqaDUG/aNFs8hIr0CStyW5Evh8kvsn+XqSJW13/9eTPKhnn7uvCI5cxUvykXbbC5M8b5zzvS3JZUluTHJukh3b8lWS7Jfk/CTXJPlykvXb3X7Y/nl9e2XmqR3e14FJTkryxfZcZyV5eJK3J/lLkkuS7NSz/WuS/KHd9oIke49z7I2TfLX9fC5M8oYxtntKkiuTrNpT9k9JfteuPznJoiQ3tD0hHx3lGA8Hzu15/99ty7dKckbbe3Jukpf07HNMkk+lGbawLMlPkvxtkv9o/47OSfL4nu1HPvcbk5yd5J/Gee9jnneUbb+f5OAkPwFuBrYc63NOsjbwDWDjnqtzG/dfPUzyj0kWJ7m+Pf4jxzp/v6o6tqq+AdzYYfMtgG9V1VVVdSvwn8DWUzyWpAmMlwOSrNl+l1/T/t//VZKNkhwMbAcc3n5nHD7KcUfdt61bN8lnk1zR5qWDkqzafq8cATy1Pe71Hd/D99tj/LTd79QkGyT5Uvs9/6skm/ds//E2F92Q5NdJthvn2E9pj3t9kjMzRm9H+xl+pa/s40k+0a6/uv3uvbHNX/fpFUzyL8Bnet7/e9ryFyT53zaGnyZ5bM8+FyX59yS/S3JT+7lu1OahG5N8J8n9e7Y/KU1+XJrkh0m27o+jZ9sxzzvKtpXk/yb5E/Cnnvd/n885yd8D7wBe2r7PM9vy3t84qyTZP8nFaX47HJdk3bHO36uqbq+q/6iqHwN3ddmnx4uBvwA/al+/EDipqhZX1e3A+4DtkzxkksfVJNig0mT9Lc0Vj82AvWj+DX2+fb0pzVWS+ySqHtvS/OjfEPgQ8Nkk6d8oySOAfYAnVdU6wHOBi9rqNwA7AzvQ9AhcB3yyrdu+/XO99urNz5Js2n65bjpOXC8EvgDcH/gt8K32vW0CvBc4smfbvwAvAOYDrwE+luQJo7yHVYBTgTPb4+wIvCnJc/u3raqfAzcBz+4pfjlwfLv+ceDjVTUfeAjw5VGO8Ufu+SG/XlU9u218nNEe5wHAy4BP9SWklwD70/yd3Ab8DPhN+/orNEMIR5xP86NkXeA9wBeTPHCU997lvP1eRfNvah3gYsb4nNuroM8DLu+5Snd53/kfTnNF7k3AAuB04NQkf9PWfyrJp8aJZTI+Czw9TaPufsAraBp8kmbGeDlgD5rvpwcDGwD/CtxSVe+k+cG5T/udsc8oxx1137buWOBO4KHA44GdgD2r6g/tdj9rj7seQJKXp70gNo7daL73NqH5Xv8ZTT5dH/gD8O6ebX8FPK6tOx44Kcma/QdMM9z4NOCgdtu3AF9NsmCU858APD/J/HbfVWnywfHtd/gngOe1OfhpwP/2H6CqPtv3/t/d5sPPAXvTfI5HAl9LskbPrrvSDBN8OE3+/QZNg2VDmtzbe/HxG8DDaHLJb4BReyc7nrffzjS/Sx7Vvh71c66qb9L0+pzYvs9tRjnWq9vlWcCWwDx6fg+1DciXjxPLVO0BHFdVNXKqdqHnNcCjZ+Dcatmg0mT9FXh3Vd1WVbdU1TVV9dWqurmqbgQOpklyY7m4qo6uqrtoEtQDabqr+90FrAE8KsnqVXVRVZ3f1u0NvLOqLq2q24ADgRdnjC77qvpzVa1XVX8eJ64fVdW3qupOmm7zBcAHq+oOmh6HzZOs1x7vtKo6vxo/AL5N08jo9yRgQVW9t736dAFwNE0SHc0JNA0PkqwDPJ97uunvAB6aZMOqWtY2wLp4Ac1QjM9X1Z1V9RvgqzRXtEacUlW/bntXTgFurarj2r+jE2l+PNC+95Oq6vKq+mtVnUhzVW+0MfNdztvvmPaK2p1VdcckPufRvBQ4rarOaP8OPwKsRfOjgKp6XVW9ruOxJvJHmuFIlwE3AI+kaYRLmhnj5YA7aH5MP7Sq7mq/227oeNxR903TS/U84E1VdVNV/QX4GGN/l1NVx1fVmL0jrc+333FLaRoN51fVd3ryUO937xfbfHtnVR1Kkx8fMcoxXwmcXlWnt9/TZwCLaPJJf4wX0zRQdm6Lng3c3JNf/go8OslaVXVFVXUdSv5a4Miq+kX7OR5Lc7HuKT3bHNb26l9G09D9RVX9tv37PKXvvX+uqm7s+bveZoyeny7n7feBqrq2qm5pz9X1cx7NK4CPVtUFVbUMeDuw28hvk6p6bFUdP+4RJqm9ULwDze+pEacDL0ny2CRrAQcABdxvOs+te7NBpcla0v7wBiDJ/ZIc2XZx30Az5G699Axd63P32OCqurldnde/UVWdR9O7cCDwlyT/mXtuqtwMOKXtdbqe5kreXYzeMOvqqp71W4Cr2wbFyOu740zyvCQ/TzOU7XqaRLXhKMfcjGZY2vU9sb5jnDiPB3Zpr6btAvymTXgA/0JzJe+cNENBXtDxfW0GbNsXwytoehrHeu/9r+/++0mye89wiutprniN9d4nOm+/S3pfTOJzHs3GNL1cAFTVX9vjbzLmHlP3aZr71jYA1gZOxh4qaSaNlwO+QDPC4D+TXJ7kQ2km6+lirH03A1YHrug555E0PSbLYzLfvfumGQK9tD3/uoz93ft/+r57n0Fz8XI0x9NeyKNnVEQ1IwFeStP7dEWS05Js1fF9bQbs2xfDg2m+l0d0eu9phlV+MM3wzhu4Z6TKWO99ovP26887XT/n0dwr77Trq7F8v00msjvw46q6cKSgqv6Hpnfzq20MF9EMOR/oDMQrOxtUmqzqe70vzdWbbasZjjYy5O4+w/gmfaLmCt8zaL4kCzikrbqEZhjCej3Lmu2Vrv74plXb2PkqTY/HRtUM7zid0d/vJcCFfXGuU1X3uVIIUFVn03z5PY97D/ejqv5UVS+jSeCHAF9ph2RM5BLgB30xzKuqf+v+rhtJNqPpYdsH2KB9779n7Pc+2fPe/XfX4XOe6O/5cpp/NyPHC01ivWyC/aZiG5retWvbK6iHAU9O0jUJS5qcMXNA27v9nqp6FE2P9AtofnTCBN8b4+x7CU1Px4Y955tfVSNDmGc672wHvI1mON792+/DpYz93fuFvs9m7ar64BiHPwl4Zpp7n/+Je+edb1XV39E0xs6h+f7v4hLg4L4Y7ldVU5kY4eU0kys8h6Zxs3lbPtZ7n+x5e/PORJ/zpPIOzW0Qd3LvxuJ02517904BUFWfrKqHVdUDaHLpajT5WjPEBpWW1zo0V5OuT3NT8Lsn2L6TJI9I8uz2h/Wt7TlGeoyOAA5uf+CTZEGSF7V1S2iGKWw5HXGM4m9ohgAsAe5MM6nGTmNs+0vghjSTa6zVXml7dJInjXP842nGjm9PO2MPQJJXJlnQ9rSM3PTc5cbVrwMPT/KqJKu3y5MyiQkaeqxNk1CWtDG9hrHHZC/veSf6nK8CNhhj2Ac095j9Q5Id2yvM+9L8IPppl5O38a5J8x25Wpqb1cfqdf0VsHuam9ZXB15Hc3/X1VM4lqSJjZkDkjwryWPa/2M30AzjG/muvIpxcsNY+1bVFTRDjg9NMj/N5AMPSTOb2shxH5T2Hs0ZsA7ND/MlNN8hB9DcWzqaLwIvTPLcNuesmWZCqQeNtnFVLQG+T3Pv1oXV3BNGmkki/rG9cHcbsIzukyUcDfxrkm3TWDvJP6QZyj5Z67Tnv4ZmyNr7Z/C8E33OV9EM/x/rt/MJwP9LskWSedxzz9WdXU6e5hEcI/fF/U37dzfmxekkT6MZdXFSX/ma7W+NpBkSeBTNPdjXdYlDU2ODSsvrP2juTbka+DnwzWk67hrAB9vjXknTM/OOtu7jwNeAbye5sT3vtnD3MMKDgZ+0Xf5PSTMpxbKMPylFJ9XcJ/YGmh/s19FcPfvaGNveRXOz7eNopi29mmY2pPFm/TkBeCbw3ZEf5K2/BxYnWUbz/nfrHXo5Qbw70Yz1v5zmszyE5vOdlLYH7VCaG6evAh4D/GQmzjvR51xV59B8Vhe0f88b9+1/Ls29BIfRfO4vpJly+XaANM+NOmKcEI6macS/DHhnu/6qdt/t2r+HEW+hafT/iSYRP5/mSu+Ex5I0JWPmAJphxV+haRD9AfgBTSNjZL8Xp5nB9BOjHHe8fXenudBzNs130le4Zxjdd2keVXFlkpELKa9IMl2Pr/gWzTDiP9KMYriVvqFqI6rqEpoenXfQfB9dAvw74//eO56mB6j3/p5VaC5EXQ5cS3OfTqf7TqtqEc39TIfTfFbn0UzWMBXH0bzny2g++zHvH56G8070OY80XK5J8ptR9v8czbDRH9Lk/FuB149Uppl1drznp51Lkx82aWO5hbbHK82D6/uHku8BnNzmy15r0vxdLqO5sPsz4F3jnFfTIFUz2lMtSZIkSSste6gkSZIkaYpsUEmSJEnSFNmgkiRJkqQpskElSVrpJdknyaIktyU5pqf8Fe2kNSPLzUkqyRPb+gOT3NG3zUzNIipJmoVWWIOqLxktS3JXksN66ndMck6bzL43Mh1qW5ckhyS5pl0+NN5UkpIk9bkcOIhmJq67VdWX2mekzauqeTQzmV0A9M7idWLvNlV1wYoLW5I07FZbUSdqExUA7XMNrqKdgjLNAzBPBvYETgXeB5wIPKXdZS9gZ5oHaBZwBk3CG2/aYzbccMPafPPNp/NtSJIG6Ne//vXVVbVgsvtV1ckASRYCoz6Tp7UHcFxNwxS45iBJWnmMl39WWIOqz4uBvwA/al/vAiyuqpEG1oHA1Um2ap83swdwaFVd2tYfSvOsgXEbVJtvvjmLFi2amXcgSVrhklw8g8fejOah2v/cV/XCJNcCVwCHV9WnxznGXjQXAdl0003NQZK0khgv/wzqHqr+K4BbA2eOVFbVTcD5bfl96tv1rRlFkr3acfKLlixZMu2BS5JWWrsDP6qqC3vKvgw8ElhAcyHvgCQvG+sAVXVUVS2sqoULFky6I02SNAut8AZVkk1pnrh9bE/xPGBp36ZLgXXGqF8KzBvtPiqTmSRpinbn3rmJqjq7qi6vqruq6qfAx2lGWUiSBAymh2p34Md9VwCXAfP7tpsP3DhG/Xxg2XSMcZckKcnTgY2Br0ywaQFOiiRJutugGlTH9pUtpplwArh70oqHtOX3qW/XFyNJUgdJVkuyJrAqsGqSNZP03ke8B/DVqrqxb78XJbl/O9vsk4E3AP+94iKXJA27FdqgSvI0YBPa2f16nAI8OsmubcI7APhdOyEFwHHAm5NskmRjYF/gmBUUtiRp9tsfuAXYD3hlu74/QJt3XsJ9L/YB7AacRzNi4jjgkKoabTtJ0hy1omf52wM4uf8KYFUtSbIrcDjwReAXNElsxJHAlsBZ7evPtGWSJE2oqg4EDhyj7lZgvTHqxpyAQpIkWMENqqrae5y67wBbjVFXwFvbRZIkSZKGwqCeQ6UJbL7faYMOYc656IP/MOgQJGngzD+DYQ6SZq9BPYdKkiRJkmY9G1SSJEmSNEU2qCRJkiRpimxQSZIkSdIU2aCSJEmSpCmyQSVJkiRJU2SDSpIkSZKmyAaVJEmSJE2RDSpJkiRJmiIbVJIkSZI0RTaoJEmSJGmKOjWokixIsqDn9WOSHJTkZTMXmiRJkiQNt649VF8GXgiQZEPgh8A/AUck2XeGYpMkSZKkoda1QfVY4Oft+ouB86pqa2B3YO+ZCEySJEmShl3XBtVawLJ2/TnA19r13wAPnu6gJEmSJGk26Nqg+hOwS5IHAzsB327LNwKun4nAJEmSJGnYdW1QvQc4BLgI+HlV/aItfy7w28mcMMluSf6Q5KYk5yfZri3fMck5SW5O8r0km/XskySHJLmmXT6UJJM5ryRp7kqyT5JFSW5LckxP+eZJKsmynuVdPfXmH0nSuFbrslFVnZxkU2Bj4Myequ8AX+16siR/R9MweynwS+CBbfmGwMnAnsCpwPuAE4GntLvuBewMbAMUcAZwAXBE13NLkua0y4GDaC4ErjVK/XpVdeco5eYfSdK4JuyhSrJ6kiuBDavqt1X115G6qvpFVZ0zifO9B3hvVf28qv5aVZdV1WXALsDiqjqpqm4FDgS2SbJVu98ewKFVdWm7/aHAqydxXknSHFZVJ1fVfwHXTHJX848kaVwTNqiq6g7gDporc1OWZFVgIbAgyXlJLk1yeJK1gK3p6fmqqpuA89ty+uvb9a2RJGl6XNzmpc+3oyZGTCr/JNmrHVq4aMmSJTMVqyRpiHS9h+ow4O1JOg0RHMNGwOo0065vBzwOeDywPzAPWNq3/VJgnXa9v34pMG+0cewmM0nSJFwNPAnYDHgiTd75Uk995/wDUFVHVdXCqlq4YMGCGQpZkjRMujaQtgN2AC5L8nvgpt7KqvrHDse4pf3zsKq6AiDJR2kaVD8E5vdtPx+4sV1f1lc/H1hWVffpNauqo4CjABYuXLhcvWqSpJVbVS0DFrUvr0qyD3BFkvlVdQOTyD+SpLmpa4PqaiYx+cRoquq6JJcy+tDBxTTj1AFIsjbwkLZ8pH4bmoksaNcXI0nS9BrJUSM9UOYfSdK4us7y95ppOt/ngdcn+SbNfVlvAr4OnAJ8OMmuwGnAAcDveia8OA54c5LTaZLdvjTDECVJmlA7ZH01YFVg1SRrAnfSDPO7nuZ5i/cHPgF8v6pGhvmZfyRJ4+p6DxUASRYmeWnbg0SStSd5X9X7gF8BfwT+QPMMq4OragmwK3AwcB2wLbBbz35H0kynfhbwe5pG15GTiV2SNKftTzP0fD/gle36/sCWwDdphpj/HrgNeFnPfuYfSdK4OjWGkmwEfI3mxt0CHkbzHI6PArcCb+xynHbGwNe1S3/dd4Ct7rNTU1fAW9tFkqRJqaoDaR7JMZoTxtnP/CNJGlfXHqqPAVcCGwA395SfBOw03UFJkiRJ0mzQdbjejsCO7cQSveXnA5tOe1SSJEmSNAt07aFaC7h9lPIFNEP+JEmSJGnO6dqg+iHw6p7XlWRV4G3A/0x3UJIkSZI0G3Qd8vdW4AdJngSsARwKbA2sCzx9hmKTJEmSpKHWqYeqqs4GHgP8FPg2sCbNhBSPr6rzZy48SZIkSRpenZ8hVVVXAu+ewVgkSZIkaVbp+hyq7ceoKppJKc6vqmunLSpJkiRJmgW69lB9n6bxBDAyb3rv678m+Rrwqqq6afrCkyRJkqTh1XWWv38A/gC8Enhou7wSWAzs2i6PAz44AzFKkiRJ0lDq2kN1EPDGquqdIv2CJEuAQ6rqiUnuAg4DXj/dQUqSJEnSMOraQ/Uo4LJRyi9r6wDOAv52OoKSJEmSpNmga4PqbOCdSdYYKWjX39HWATwYuHJ6w5MkSZKk4dV1yN/rgFOBy5L8nmZCiscAfwVe0G6zJfCpaY9QkiRJkoZUpwZVVf0iyRY0E1E8gmZmvxOAL43M6ldVx81YlJIkSZI0hCbzYN+bgCNnMBZJkiRJmlU6N6iSPBjYDngAffdeVdVHpzkuSZIkSRp6nRpUSV4BfA64E1jCPQ/1pV23QSVJkiRpzuk6y997gUOB+VW1eVVt0bNs2fVkSb6f5NYky9rl3J66HZOck+TmJN9LsllPXZIckuSadvlQknR+l5KkOS3JPkkWJbktyTE95U9JckaSa5MsSXJSkgf21B+Y5I6evLUsSee8J0la+XVtUG0EfKaq7pqGc+5TVfPa5REASTYETgbeBawPLAJO7NlnL2BnYBvgsTQzC+49DbFIkuaGy2keUv+5vvL7A0cBmwObATcCn+/b5sSevDWvqi6Y6WAlSbNH13uoTge2BWYqiewCLK6qk6C5IghcnWSrqjoH2AM4tKoubesPBV4LHDFD8UiSViJVdTJAkoXAg3rKv9G7XZLDgR+s2OgkSbNZ1wbVGcAhSbYGzgLu6K0cSVQdfSDJB4FzgXdW1feBrYEze453U5Lz2/Jz+uvb9a0ncU5JkrrYHljcV/bCJNcCVwCHV9Wnx9o5yV40oyrYdNNNZyxISdLw6NqgGpku/R2j1BWwasfjvA04G7gd2A04NcnjgHk0k130Wgqs067Pa1/31s1LkqrqnSDDZCZJmpIkjwUOAF7UU/xlmiGBV9GM1Phqkuur6oTRjlFVR7Xbs3DhwhptG0nSyqXTPVRVtco4S9fGFFX1i6q6sapuq6pjgZ8AzweWAfP7Np9PM5adUernA8v6G1PtOY6qqoVVtXDBggVdQ5MkzWFJHgp8A3hjVf1opLyqzq6qy6vqrqr6KfBx4MWDilOSNHy6TkoxUwoIzfCKbUYKk6wNPIR7hl3cq75d7x+SIUnSpLWzyn4HeF9VfWGCzUfyliRJQMcGVTtt+euSLG6nNd+yLd8vyUs6HmO9JM9NsmaS1dpnW20PfAs4BXh0kl2TrEkz5OJ37YQUAMcBb06ySZKNgX2BYyb1TiVJc1abd9akGaK+ak8u2gT4LvDJqrrPREdJXpTk/m0efDLwBuC/V2z0kqRh1rWH6o3A/jTjwnuvzF0G7NPxGKvTTFm7BLgaeD2wc1WdW1VLgF2Bg4HraMap79az75HAqTQTYvweOI177uuSJGki+wO3APsBr2zX9wf2BLYE3t37rKme/XYDzqMZgn4ccEg7ZF2SJKD7pBT/Cry2qk5LclBP+W/oONte22h60jj13wG2GqOugLe2iyRJk1JVBwIHjlH9nnH2e9lMxCNJWnl07aHajKZnqN8dwFrTF44kSZIkzR5dG1QXAE8Ypfz5NNOgS5IkSdKc03XI30eAw5Pcj+YeqqcmeRXNELx/nqngJEmSJGmYdWpQVdXnk6wGvB+4H/AFmgkp3lBVJ85gfJIkSZI0tLr2UFFVRwNHJ9kQWKWq/jJzYUmSJEnS8Ov6HKpVkqwCUFVXA6sk2TPJ02Y0OkmSJEkaYl0npTiN5rlRJJkHLAI+DPwgye4zFJskSZIkDbWuDaon0jxJHmAX4AbgAcBrgbfMQFySJEmSNPS6NqjWAa5v13cCTqmqO2gaWQ+ZicAkSZIkadh1bVD9GXh6krWB5wJntOXrAzfPRGCSJEmSNOy6zvL3UZqp0pcBFwM/bMu3B86agbgkSZIkaeh1fQ7VkUl+DTwYOKOq/tpWnQ+8a6aCkyRJkqRhNpnnUC2imd0PgCSrV9VpMxKVJEmSJM0CXZ9D9YYku/a8/ixwS5JzkzxixqKTJEmSpCHWdVKKNwBLAJJsD7wEeDnwv8ChMxOaJEmSJA23rkP+NgEuatdfCJxUVV9Ochbwo5kITJIkSZKGXdceqhuABe363wH/067fAaw53UFJkiRJ0mzQtYfq28DRSX4LPBT4Rlu+NXDhTAQmSZIkScOuaw/V/wV+AmwIvLiqrm3LnwCcMNmTJnlYkluTfLGnbMck5yS5Ocn3kmzWU5ckhyS5pl0+lCSTPa8kaW5Ksk+SRUluS3JMX535R5I0ZV2fQ3UD8PpRyt89xfN+EvjVyIskGwInA3sCpwLvA04EntJushewM7ANUMAZwAXAEVM8vyRpbrkcOAh4LrDWSKH5R5K0vLr2UN0tyd8m2bR3meT+uwHXc899WAC7AIur6qSquhU4ENgmyVZt/R7AoVV1aVVdRjOz4KsnG7skaW6qqpOr6r+Aa/qqzD+SpOXS9TlU6yY5NsktwGU09031Lp0kmQ+8F9i3r2pr4MyRF1V1E3B+W36f+nZ9ayRJWj7Tmn+S7NUOLVy0ZMmSGQhXkjRsuvZQfYRmuMPOwK00z6D6d+BS4KWTON/7gM9W1SV95fOApX1lS4F1xqhfCswbbRy7yUySNAnTln8AquqoqlpYVQsXLFgw2iaSpJVM11n+nge8rKp+lOQu4NdVdWKSK4C9ga9MdIAkjwOeAzx+lOplwPy+svnAjWPUzweWVVX1H6iqjgKOAli4cOF96iVJ6jFt+UeSNDd17aFaD7i4XV8KbNCu/wx4WsdjPBPYHPhzkiuBtwC7JvkNsJimBwyAJGsDD2nL6a9v1xcjSdLyMf9IkpZL1wbV+cCW7fofgN3a4Q67ANeOude9HUWTpB7XLkcAp9HMuHQK8OgkuyZZEzgA+F1VndPuexzw5iSbJNmY5h6sYzqeV5I0xyVZrc0vqwKrJlkzyWqYfyRJy6lrg+oY4LHt+gdphvndDnwYOKTLAarq5qq6cmShGUZxa1UtqaolwK7AwcB1wLbAbj27H0kzne1ZwO9pGmJHdoxdkqT9gVuA/YBXtuv7m38kScur63OoPtaz/t0kjwSeCPypqs6ayomr6sC+198Bthpj2wLe2i6SJE1Km3MOHKPO/CNJmrKuk1LcS1VdzD33VEmSJEmzyub7nTboEOakixoho+gAABWHSURBVD74D4MOYdp1frBvkp2T/DDJ1e3yoyT/NJPBSZIkSdIw69RDlWRf4P00N+ce0xY/FTg+ybuq6iMzE56klZ1XCFe8lfHqoCRJg9J1yN9bgH2q6uiess8l+SXwXpoH/0qSJEnSnNJ1yN884HujlH+vrZMkSZKkOadrg+q/gBePUr4r8LXpC0eSJEmSZo+uQ/7OA/ZL8izgZ23ZU9rlo0nePLJhVX10ekOUJEmSpOHUtUH1apoHHj68XUZcB7ym53UBNqgkSZIkzQldH+y7xUwHIkmSJEmzTefnUEmSJEmS7s0GlSRJkiRNkQ0qSZIkSZoiG1SSJEmSNEVjNqiSfC7JOu369km6zggoSZIkSXPCeD1UrwTWbte/B6w/8+FIkiRJ0uwxXq/TRcDrk3wbCPDUJNeNtmFV/XAGYpMkSZKkoTZeg+rfgaOBt9M8sPeUMbYrYNVpjkuSJEmSht6YDaqq+m/gv5OsB1wLbA38ZUUFJkmSJEnDbsJZ/qrqeuBZwJ+q6prRlq4nS/LFJFckuSHJH5Ps2VO3Y5Jzktyc5HtJNuupS5JDklzTLh9Kksm+WUmS+iVZ1rfcleSwtm7zJNVX/65BxyxJGh6dZu6rqh8kWSPJ7sCjaIb5nQ0cX1W3TeJ8HwD+papuS7IV8P0kvwUuBk4G9gROBd4HnAg8pd1vL2BnYJv23GcAFwBHTOLckiTdR1XNG1lPsjZwFXBS32brVdWdKzQwSdKs0Ok5VEkeBfwR+CiwLU1D52PAH5M8suvJqmpxTwOs2uUhwC7A4qo6qapuBQ4EtmkbXQB7AIdW1aVVdRlwKPDqrueVJKmjF9MMb//RoAORJM0OXR/s+3Hgf4FNq2q7qtoO2BQ4E/iPyZwwyaeS3AycA1wBnE5zf9aZI9tU1U3A+W05/fXt+tZIkjS99gCOq6rqK784yaVJPp9kw7F2TrJXkkVJFi1ZsmRmI5UkDYWuDaqnA++oqhtGCtr1dwLPmMwJq+p1wDrAdjTD/G4D5gFL+zZd2m7HKPVLgXmj3UdlMpMkTUWSTYEdgGN7iq8GngRsBjyRJi99aaxjVNVRVbWwqhYuWLBgJsOVJA2Jrg2qW4H1Rilft62blKq6q6p+DDwI+DdgGTC/b7P5wI3ten/9fGDZKFcQTWaSpKnaHfhxVV04UlBVy6pqUVXdWVVXAfsAOyXpz1mSpDmqa4PqVODoJE9Psmq7PAM4Evjacpx/NZp7qBbTTDgB3H1T8Eg5/fXt+mIkSZo+u3Pv3qnRjFzIc6ZZSRLQvUH1RuBPNDfp3touP6CZqOJNXQ6Q5AFJdksyr22QPRd4GfBdmocGPzrJrknWBA4AfldV57S7Hwe8OckmSTYG9gWO6Ri7JEnjSvI0YBP6ZvdLsm2SRyRZJckGwCeA71dV/zB1SdIc1XXa9OuBFyV5KPBImitzZ1fVeZM4V9EM7zuCpiF3MfCm9gHCJNkVOBz4IvALYLeefY8EtgTOal9/pi2TJGk67AGcXFU39pVvCbwfeABwA81jO162gmOTJA2xTg2qEW0DajKNqN59l9Dc7DtW/XeArcaoK+Ct7SJJ0rSqqr3HKD8BOGEFhyNJmkW6DvmTJEmSJPWxQSVJkiRJU2SDSpIkSZKmaMIGVZLVkryunV1PkiRJktSasEFVVXcCHwZWn/lwJEmSJGn26Drk7+fAE2YyEEmSJEmabbpOm340cGiSzYBfAzf1VlbVb6Y7MEmSJEkadl0bVMe3f350lLoCVp2ecCRJkiRp9ujaoNpiRqOQJEmSpFmoU4Oqqi6e6UAkSZIkabbp/ByqJM9L8vUkZyd5cFu2Z5IdZy48SZIkSRpenRpUSV4BfBn4E83wv5Ep1FcF3jozoUmSJEnScOvaQ/VW4LVV9f+AO3vKfw48btqjkiRJkqRZoGuD6mHAz0YpXwbMn75wJEmSJGn26Nqguhx4+Cjl2wPnT184kiRJkjR7dG1QHQV8IsnT29cPTrIH8CHg0zMSmSRJkiQNua7Tpn8oybrAGcCawPeA24CPVNUnZzA+SZIkSRpaXR/sS1W9M8nBwKNoerbOrqplMxaZJEmSJA25zs+hahVwK3AzcNdkdkyyRpLPJrk4yY1JfpvkeT31OyY5J8nNSb6XZLOeuiQ5JMk17fKhJJlk7JIkjSrJ95PcmmRZu5zbUzdmfpIkqetzqNZI8h/AtcCZwO+Aa5N8PMmaHc+1GnAJsAOwLvAu4MtJNk+yIXByW7Y+sAg4sWffvYCdgW2AxwIvAPbueF5JkrrYp6rmtcsjADrkJ0nSHNd1yN+ngZ2APbln+vSnAh8A1gH+eaIDVNVNwIE9RV9PciHwRGADYHFVnQSQ5EDg6iRbVdU5wB7AoVV1aVt/KPBa4IiO8UuSNBW7MH5+kiTNcV2H/P0f4DVV9aWquqBdvgT8C/DiqZw4yUY0U7EvBram6fkC7m58nd+W01/frm/NKJLslWRRkkVLliyZSmiSpLnpA0muTvKTJM9syybKT/diDpKkuadrg+om4LJRyi8DbpnsSZOsDnwJOLa9wjcPWNq32VKa3i9GqV8KzBvtPqqqOqqqFlbVwgULFkw2NEnS3PQ2YEtgE5pHhZya5CFMnJ/uxRwkSXNP1wbVYcC7k6w1UtCuv6ut6yzJKsAXgNuBfdriZcD8vk3nAzeOUT8fWFZVNZlzS5I0mqr6RVXdWFW3VdWxwE+A5zNxfpIkzXFj3kOV5Gt9Rc8ELkvyu/b1Y9r91+56srZH6bPARsDzq+qOtmoxzX1SI9utDTykLR+p3wb4Zft6m546SZKmWwFh4vwkSZrjxpuU4pq+11/te33hFM73aeCRwHOqqneo4CnAh5PsCpwGHAD8rueG3+OANyc5nSbJ7cske8YkSRpNkvWAbYEfAHcCLwW2B95EM7vtePlJkjTHjdmgqqrXTOeJ2ud27A3cBlzZc/vT3lX1pTZZHQ58EfgFsFvP7kfSjG0/q339mbZMkqTltTpwELAVzTMWzwF2rqpzASbIT5KkOa7rtOnLraouphk+MVb9d2iS2Wh1Bby1XSRJmjZVtQR40jj1Y+YnSZI6NaiS3J/mGVLPAh5A32QWVfWAaY9MkiRJkoZc1x6q42ieuXEscBXNfUySJEmSNKd1bVA9E9ihqn4zg7FIkiRJ0qzS9TlU509iW0mSJEmaE7o2kt4IfCDJNklWncmAJEmSJGm26Drk7zxgLeA3AD1TngNQVTayJEmSJM05XRtUJwDrAm/ASSkkSZIkCejeoFoIPLmqfj+TwUiSJEnSbNL1HqqzgfkzGYgkSZIkzTZdG1T7Ax9N8pwkGyVZv3eZyQAlSZIkaVh1HfJ3evvnt7n3/VNpXzsphSRJkqQ5p2uD6lkzGoUkSZIkzUKdGlRV9YOZDkSSJEmSZptODaokTxivvqp+Mz3hSJIkSdLs0XXI3yKae6V6n+jbey+V91BJkiRJmnO6Nqi26Hu9OvB44J3A26c1IkmSJEmaJbreQ3XxKMXnJVkKvBv4xrRGJUmSJEmzQNfnUI3lQuBx0xGIJEmSJM02nRpU/Q/yTbJBkkcDHwDO7XqyJPskWZTktiTH9NXtmOScJDcn+V6SzXrqkuSQJNe0y4eS5D4nkCRpkpKskeSzSS5OcmOS3yZ5Xlu3eZJKsqxnedegY5YkDY+u91Bdzb0noYBmgopLgJdO4nyXAwcBzwXWuvtAyYbAycCewKnA+4ATgae0m+wF7Axs08ZxBnABcMQkzi1J0mhWo8lnOwB/Bp4PfDnJY3q2Wa+q7hxEcJKk4TbVB/v+FVgCnDeZBFNVJwMkWQg8qKdqF2BxVZ3U1h8IXJ1kq6o6B9gDOLSqLm3rDwVeiw0qSdJyqqqbgAN7ir6e5ELgicCvBxKUJGnWGJYH+24NnNlzvpuSnN+Wn9Nf365vPdqBkuxF06PFpptuOlPxSpJWUkk2Ah4OLO4pvjjJyAiJf6+qq8fY1xwkSXPMuPdQjXLv1KjLNMQxD1jaV7YUWGeM+qXAvNHuo6qqo6pqYVUtXLBgwTSEJkmaK5KsDnwJOLYdIXE18CRgM5oeq3Xa+lGZgyRp7pmoh2q0e6f6VYfjTGQZML+vbD5w4xj184FlVTVRbJIkdZJkFeALwO3APgBVtYzm4fYAVyXZB7giyfyqumEwkUqShslEDaH+e6d6/T3wRmA6btJdTHOfFABJ1gYewj3DLRbTTEjxy/b1Ntx7KIYkSVPWjnj4LLAR8PyqumOMTUcu5DnTrCQJmKBBNdq9U0meABwCbA8cSTMjXydJVmvPuSqwapI1aRpkpwAfTrIrcBpwAPC7drgFwHHAm5OcTpPM9gUO63peSZIm8GngkcBzquqWkcIk2wLXA38C7g98Avh+VfUPU5ckzVGdH+ybZIskxwO/AK4FHlVVb6iqJZM43/7ALcB+wCvb9f3bY+wKHAxcB2wL7Naz35E006mfBfyeptF15CTOK0nSqNrnHu5N86D6K3ueN/UKYEvgmzRD0H8P3Aa8bGDBSpKGzoT3PiXZgKbH6F+BnwBPrapF4+81uqo6kHtPTdtb9x1gqzHqCnhru0iSNG2q6mLGH8J3woqKRZI0+0w0y987gPNpHnb4oqp69lQbU5IkSZK0spmoh+ogmmF5lwKvS/K60Taqqn+c7sAkSZIkadhN1KA6jomnTZckSZKkOWmiWf5evYLikCRJkqRZp/Msf5IkSZKke7NBJUmSJElTZINKkiRJkqbIBpUkSZIkTZENKkmSJEmaIhtUkiRJkjRFNqgkSZIkaYpsUEmSJEnSFNmgkiRJkqQpskElSZIkSVNkg0qSJEmSpsgGlSRJkiRNkQ0qSZIkSZoiG1SSJEmSNEWzpkGVZP0kpyS5KcnFSV4+6JgkSXODOUiSNJbVBh3AJHwSuB3YCHgccFqSM6tq8WDDkiTNAeYgSdKoZkUPVZK1gV2Bd1XVsqr6MfA14FWDjUyStLIzB0mSxjNbeqgeDtxVVX/sKTsT2KF/wyR7AXu1L5clOXcFxKd7bAhcPeggpiKHDDoCzTL+Wx+MzQZwTnPQ7OH/S80V/ltf8cbMP7OlQTUPWNpXthRYp3/DqjoKOGpFBKX7SrKoqhYOOg5ppvlvfU4xB80S/r/UXOG/9eEyK4b8AcuA+X1l84EbBxCLJGluMQdJksY0WxpUfwRWS/KwnrJtAG8GliTNNHOQJGlMs6JBVVU3AScD702ydpKnAy8CvjDYyDQKh7porvDf+hxhDppV/H+pucJ/60MkVTXoGDpJsj7wOeDvgGuA/arq+MFGJUmaC8xBkqSxzJoGlSRJkiQNm1kx5E+SJEmShpENKkmSJEmaIhtUkiRJkjRFNqgkqYMkayQ5OMkFSZa2ZTsl2WfQsUmSVm7moOFmg0rTIsnqSbZL8tL29dpJ1h50XNI0+hjwaOAVwMhsPouBfxtYRJIAc5DmBHPQEHOWPy23JI8BvgbcBjyoquYleT6wR1W9dLDRSdMjyRXAQ6vqpiTXVtX6bfn1VbXegMOT5ixzkOYCc9Bws4dK0+HTwAFVtRVwR1v2A+AZgwtJmna3A6v1FiRZQPNMIkmDYw7SXGAOGmI2qDQdtga+2K4XQFXdBKw1sIik6XcScGySLQCSPBA4HPjPgUYlyRykucAcNMRsUGk6XAQ8sbcgyZOB8wYSjTQz3kHzb/0sYD3gT8DlwHsGGJMkc5DmBnPQEPMeKi23JC8APgscAewLHAz8K/Daqvr2IGOTZkI7zOLq8gtUGjhzkOYac9DwsUGlaZHkCcCewGbAJcDRVfXrwUYlLZ8kW3bZrqoumOlYJI3NHKSVkTlo9rBBJUljSPJXmnsyMs5mVVWrrqCQJElzhDlo9rBBpSlJ8t4u21XVATMdiyRpbjEHSRomq028iTSqBw86AEnSnGUOkjQ07KGSpA6SrAa8DtgB2JCeIRhVtf2g4pIkrfzMQcPNadM1bZKsk2SLJFuOLIOOSZpGHwP2Bn5IM0XzV4EHAN8dZFCSGuYgreTMQUPMHiottySPAr4EbMM9N0+OPFzRGyW1UkhyGfDUqvpzkuurar0kWwFHVtUOg45PmqvMQZoLzEHDzR4qTYdPAd8D1gduAO4PHAnsMcigpGl2P5rpmAFuSXK/qjoHePwAY5JkDtLcYA4aYvZQabkluQ54QFXd0XPVZG3g91W1xaDjk6ZDkp8Cb6qqXyY5FfgDzY+3V1TVIwcbnTR3mYM0F5iDhps9VJoOtwKrt+tXJ9mU5t/WBoMLSZp2bwTubNffDDwBeCGw18AikgTmIM0N5qAhZg+VlluSLwOnV9UxST4I/CNNgvtzVe082OgkSSszc5CkQbNBpWmVZBXg5cA84LiqunnAIUnTJsnmwGNp/n3fraqOH0Q8ku7NHKSVmTloeNmg0nJLsi7wBpobI/v/k+80kKCkaZbk7cABwGLglp6q8hkg0uCYgzQXmIOG22qDDkArhZOAVYFTuPd/cmllsi/wxKo6e9CBSLoXc5DmAnPQELNBpenwFGCDqrpj0IFIM+ga4KJBByHpPsxBmgvMQUPMWf40HX4MOGWnVnZvAo5KsjDJpr3LoAOT5jhzkOYCc9AQ8x4qLbckDwBOB34BXNVbV1XvHUhQ0jRL8iLgaGDDvqqqqlUHEJIkzEGaG8xBw80hf5oOBwMPpumKnt9TbmtdK5NPAe8A/hPv05CGiTlIc4E5aIjZQ6XlluRG4OFVdcWgY5FmSpKrgI2r6q5BxyLpHuYgzQXmoOHmPVSaDhcA3gysld1HgP2SZNCBSLoXc5DmAnPQELOHSsstyVuAXYDDuO/49e8OJChpmiW5BPhb4Haa2ZbuVlXeFCwNiDlIc4E5aLjZoNJyS3LhGFVVVVuu0GCkGZJkh7HqquoHKzIWSfcwB2kuMAcNNxtUkiRJkjRF3kMlSR0kWSPJwUkuSLK0LdspyT6Djk2StHIzBw03G1SS1M3HgEcDr+Ce6ZgXA/82sIgkSXOFOWiIOeRPkjpIcgXw0Kq6Kcm1VbV+W359Va034PAkSSsxc9Bws4dKkrq5nb6HoSdZQN9sS5IkzQBz0BCzQSVJ3ZwEHJtkC4AkDwQOp3lqvSRJM8kcNMRsUEnSGPpu9j0SuAg4C1gP+BNwOfDeFR+ZJGllZw6aPbyHSpLGkGRpVa3brt9QVfPb9QXA1eUXqCRphpiDZo/VJt5Ekuas85McSjOT0upJXgNkpDJpVqvqc4MJT5K0EjMHzRL2UEnSGJI8HHgrsBnwLOBHo2xWVfXsFRqYJGmlZw6aPWxQSVIHSf6nqnYcdBySpLnHHDTcbFBJkiRJ0hQ5y58kSZIkTZENKkmSJEmaIhtUkiRJkjRFNqgkSZIkaYpsUEmSJEnSFP1/GKOaYeuMtukAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 864x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Check the sex balance\n",
    "train_sex_value_counts = df_train.sex.value_counts()\n",
    "test_sex_value_counts = df_test.sex.value_counts()\n",
    "\n",
    "plt.figure(figsize=(12, 4))\n",
    "\n",
    "plt.subplot(121)\n",
    "train_sex_value_counts.plot.bar()\n",
    "train_sex_ratio = train_sex_value_counts['male']/train_sex_value_counts['female']\n",
    "plt.title(f'Train set: male vs female ratio: {train_sex_ratio:.2f}')\n",
    "plt.ylabel('Number of passengers')\n",
    "\n",
    "plt.subplot(122)\n",
    "test_sex_value_counts.plot.bar()\n",
    "test_sex_ratio = test_sex_value_counts['male']/test_sex_value_counts['female']\n",
    "plt.title(f'Test set: male vs female ratio: {test_sex_ratio:.2f}')\n",
    "\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, lets check that the relative number of passenger per class is similar between the train and test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.404343Z",
     "start_time": "2020-05-01T17:12:38.078737Z"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1QAAAEUCAYAAAAspncYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3df7xddX3n+9ebhBoliUgTUX7mij+oMISOcbD3juIM9joqVoc4d/BXQcfGtsN0vDKj3F7AqDAV7jDeacECFUGkP5Be8BfaH7Rota3WOBY0NXqliIBiDzTEJPwQ8TN/rO/RzeacnJ2Vc7I557yej8d6sNb3+11rf/faYX/PZ6/P+q5UFZIkSZKk3bfPuDsgSZIkSfOVAZUkSZIk9WRAJUmSJEk9GVBJkiRJUk8GVJIkSZLUkwGVJEmSJPVkQKXHvCSfSnLKuPuhPZPkhUnuGHc/JEmalGRjkqvG3Q/NbwZUmhNJdgwsP0py/8D2a3fnWFX1kqr64Fz1dViSNUkqydK99ZqSpPGYzfGqHe/TSd40B/08NcnnZvu4kvacfzBqTlTV8sn1JN8C3lRVNwy3S7K0qn64N/um2ZNkSVU9PO5+SFJfo45Xmh/8u0Lj4BUq7VWTaV9J3p7kLuDyJE9K8okkE0m2tvVDBvb58a99k7/QJfmvre2tSV6yi9d7e5I7k2xP8vUkJ7TyfZKckeSWJPck+XCSA9puf9H+e2/7hfLnRnhfG5P8YZKr22v9jyRrB+onX2t7kr9L8q8H6p6e5DNJtiW5O8nVrTxJ3pvkH1rdzUmObnWPa+fg20m+l+TiJI8fOsent32/m+QNA6/300k+nuT7Sb6Y5JzBXz2THJnkT5P8Yztn/8dA3RVJfjvJJ5PsBP7FFOfigCSXJ/lO+4w+Ms05m9VzIkmzaVfjRJJlSa5q5fe279IDk5wLPB+4sI0fF05x3Cn3bXVPTHJZ+96+s30/L0nyM8DFwM+149474nv4dJLfSPI37TvzowNjHUmuSXJXq/uLJEcN1L20fTdvb335T618Vbpx+t42Tnw2yT6t7qAk/1+68fzWJL82cLyN7Rxe2Y65Ocm6gfp/muTLre6adOPpOQP1Jyb52/a6f5XkmIG6b6Ub728GdmaKDJMkRw2Mbd9L8uvTnLNZPSdaHPywNQ5PAQ4ADgc20P07vLxtHwbcDzxqEBpwHPB1YBVwPnBZkgw3SvIs4DTguVW1Angx8K1W/WvAK4HjgYOArcBFre4F7b/7V9XyqvrrJIe1L8rDdtGvVwDXtPf2e8BHkuzb6m6hG2SfCLwTuCrJU1vdu4E/AZ4EHAL8Viv/31tfngnsD/xb4J5Wd14rPxZ4OnAwcPZAX57SXutg4N8BFyV5Uqu7CNjZ2pzSlslzth/wp63/TwZeDbxvcEABXgOcC6wApko/+RDwBOCodoz3Tn26Zv2cSNJs2tU4cQrdd9ehwE8DvwzcX1X/N/BZ4LQ2fpw2xXGn3LfVfRD4Id33+s/Sfee9qaq+1tr9dTvu/gBJXtOCiF35ReCN7T38EPjNgbpPAc+g+67+H8DvDtRdBry5jZ9HA3/eyk8H7gBWAwcCvw5UCyA+DtxEN/acALwlyYsHjvkLwB/QfX9/jDbWJ/kp4DrgCrox9PeBwR/Z/inwAeDN7ZxdAnwsyeMGjv1q4GV0Y/cjrlAlWQHcAPxROw9PB/5smvM1a+dkmuNrIaoqF5c5XeiCmBe19RcCPwCW7aL9scDWge1P0w0oAKcC3xyoewLdl9ZTpjjO04F/AF4E7DtU9zXghIHtpwIP0aXBrmnHXLob73Ej8PmB7X2A7wLPn6b93wKvaOtXApcChwy1+ZfAN4DnAfsMlIcuIDpioOzngFsHzvH9g/1v5+F5wJL2Pp81UHcO8Lm2/m+Bzw714xLgHW39CuDKXZyHpwI/Ap40Rd0LgTt2sW/vc+Li4uIyG8vQeLWrceKNwF8Bx0xxjB+PWdO8xpT70v0h/iDw+IGyVwM3tvVTJ7+rd+P9fBp4z8D2s+nG4CVTtN2/jX1PbNvfpgtgVg61exfwUeDpQ+XHAd8eKvu/gMvb+kbghqG+3N/WXwDcCWSg/nPAOW39t4F3Dx3768DxA5/bG3dxHl4NfHmauo3AVdPU7dE5cVk8i1eoNA4TVfXA5EaSJyS5JMltSb5Pl3K3f5Il0+x/1+RKVd3XVpcPN6qqbwJvofuy/Ickf5DkoFZ9OHBdu+p0L93A+TDdgNbX7QOv/SO6X6sOau/xFwdSFe6l+2VrVWv+Nrog6W9aCsQb2zH+nO7Xu4uA7yW5NMlKul/AngB8aeB4f9TKJ91Tj/yF7j66c7Sa7o+B2wfqBtcPB46bPG479mvprmZN1X7YocA/VtXWXbRhDs6JJM22XY0THwL+GPiDdOnN5w9kJMxkun0PB/YFvjvwmpfQXSnZE4Pf2be111iVLpXwPelSGr/PTzI4Jr+H1wMvBW5Ll4I9mf7+/wDfBP4kyd8nOaOVHw4cNDR+/DqPHFfvGli/D1jW0vMOAu6sqsGrOsNj0+lDxz607TdV+2GH0mVF7NIcnBMtEgZUGofhy+CnA88Cjquqlfwk5e5RaXy7/UJVv1dV/5zuy7joUuWg++J9SVXtP7Asq6o7p+jfqA6dXGmpD4cA30lyOPA7dOmHP11dqsZXae+vqu6qql+qqoPofvl6X5Knt7rfrKrn0KXPPRP4z8DddFegjhro+xNr4MbqXZigS/k4ZKDs0IH124HPDJ2X5VX1KwNtdnV+bgcOSLL/rjoxB+dEkmbbtONEVT1UVe+sqmcD/ytwIl1qHcwwhuxi39vprlCtGni9lVU1mXK9x2MTXVr9Q3TjyGvoUtVfRJeCuKa1mfwe/mJVvYIuoPsI8OFWvr2qTq+qpwEvB96a7v7k2+kyJQbP14qqeukIffwucPBQ+v7w2HTu0LGfUFW/P9BmprHpiBH6MdvnRIuEAZUeC1bQBQj3prtZ9h2zcdAkz0ryL1uO9QPtNSZnpLsYOLf9YU+S1Ule0eom6NLWnrabL/mcJCe1X9veQjcwfh7Yj+6LfqK91hvorsZM9vPf5CeTcGxtbR9O8twkx7VfLne29/Bwu/r1O8B7kzy5HePgoTz1KVU3I9+1wMZ2ZfBIfvJHAMAngGcmeX2Sfdvy3HQ3RM+oqr5Ll3/+vnSTjeyb5AVTNJ3VczJK3yRpN007TiT5F0n+Scuk+D5dkDL5XfQ9djF+TLdv+/78E+CCJCvTTYpxRJLjB457SLr7jXbH65I8O8kT6FLT/rCNBSvoxql76LIe/stAH38qyWuTPLGqHmr9fLjVnZhu4qAMlD8M/A3w/XSTQzy+Xe05OslzR+jjX7djnJZkaTvP/2yg/neAX27f/0myX5KXpbs3ahSfAJ6S5C3pJnVakeS4KdrN9jnRImFApceC/xd4PN0vZp+nS1+bDY8D3tOOexfdL0qTs/r8d7obYv8kyfb2usfBj9MIzwX+sqUWPC/dpBQ7sutJKT5Kdw/SVuD1wEntl8i/Ay6gGzC+B/wT4C8H9nsu8IUkO1qf/mNV3QqspBtEttKladwD/Ne2z9vp0gs+39ISbqC7yjeK0+h+ebuLLvXk9+kGEKpqO91N0CcD32ltzqM7l6N6Pd0fCFvo7t16y3CDOTonkjSbph0n6NKg/5Duj+evAZ8BrhrY71XpZjn9TR5tV/v+IvBTwN/Rfc/9Id29W9BNgLAZuCvJ3QDtD/zNM7yPD9Hd/3oXsIxusg3o7lW9je7epb9r72/Q64FvtTHml4HXtfJn0I05O+i+w99XVZ9uQdrL6e6DvpVu7H0/3XizS1X1A+AkukmU7m2v9Ql+MjZtAn6JLuV7K934d+pMxx04/nbg51v/7gL+f6aYpZZZPiej9k/zXx6ZriqpjyQb6W5Gfd1MbR9rkpxHN6nHKTM2liTNG0k+TTfhwvvH3ZfdleQLwMVVdfm4+yLNxCtU0iKT7jlTx7S0iX9G94vgdePulyRp8UpyfJKntJS/U4BjmL2MFWlOPerBZ5IWvBV0aX4H0aXkXUCXrihJ0rg8i26Sh+V0M/K9qt1XJj3mjZzyl+RkuskCDqPLPz21qj7bZjG5qJV/oZXf1vYJ3T0sb2qHuQx4e5lnKEmSJGkBGCnlL8nP092Y/ga6X7dfAPx9klV0M4adRfdk603A1QO7bqB7yvhauku3J9JNgSxJkiRJ896o91C9E3hXVX2+qn7UnsFwJ92MLJur6pr2oNaNwNo2FTPAKcAFVXVHa38BuzEriyRJM0lyWpJNSR5McsU0bd6RpJK8aKAsSc5Lck9bzm+ZFZIkjWzGe6jacxLWAR9L8k26KTc/QvcwzaOAmybbVtXOJLe08i3D9W39KGawatWqWrNmzejvQpI0733pS1+6u6pW99j1O8A5wIvpHsHwCEmOAF5F9/DQQYNZFAX8KfD3dM8f2iXHKUlaXHY1Ro0yKcWBwL50g9Hz6Z4v81HgTLobByeG2m+jSwuk1W8bqlueJMP3USXZQDe4cdhhh7Fp06YRuiZJWiiS3NZnv6q6tu2/DjhkiiYX0j277X1D5T/Oomj7X0D3rJsZA6o1a9Y4TknSIrKrMWqUlL/7239/q6q+W1V3A/8NeCndA8xWDrVfCWxv68P1K4EdU01KUVWXVtW6qlq3enWfHyglSXqkJP8G+EFVfXKK6l5ZFJIkDZoxoKqqrcAddOkQwzbTpUoAkGQ/4IhW/qj6tj7TE70lSdpjSZYD/wV4yzRNps2imOZ4G9q9WpsmJoaTMyRJi9Wok1JcDvyHJE9O8iS6wekTdA8DPTrJ+iTLgLOBm6tqS9vvSuCtSQ5OchBwOnDFrL4DSZKm9k7gQ1V16zT1I2dRgJkUkqSpjRpQvRv4IvAN4GvAl4Fzq2oCWA+cC2wFjgNOHtjvEuDjwFeArwLXtzJJkubaCcCvJbkryV3AocCHk7y91ZtFIUnaY6NMSkFVPQT8aluG624AjnzUTl1dAW9riyRJsy7JUrrxbAmwpGVM/JAuoNp3oOkXgbcCn2rbk1kUn6RLaz8d+K291W9J0sIwUkAlSdJj2JnAOwa2Xwe8s6o2DjZK8jCwtap2tKJLgKfRZVEAvB+zKCRJu8mASpI0r7XAaeMI7dYMbZtFIUnaY6PeQyVJkiRJGmJAJUmSJEk9mfI3hTVnXD/uLozVt97zsnF3QZI0DccoxyhJjy1eoZIkSZKkngyoJEmSJKknAypJkiRJ6smASpIkSZJ6MqCSJEmSpJ4MqCRJkiSpJwMqSZIkSerJgEqSJEmSejKgkiRJkqSeDKgkSZIkqScDKkmSJEnqyYBKkiRJknoyoJIkSZKkngyoJEmSJKknAypJkiRJ6smASpIkSZJ6MqCSJEmSpJ4MqCRJkiSpJwMqSZIkSerJgEqSJEmSejKgkiTNa0lOS7IpyYNJrhgof16SP03yj0kmklyT5KkD9UlyXpJ72nJ+kozlTUiS5i0DKknSfPcd4BzgA0PlTwIuBdYAhwPbgcsH6jcArwTWAscAJwJvnuO+SpIWmKXj7oAkSXuiqq4FSLIOOGSg/FOD7ZJcCHxmoOgU4IKquqPVXwD8EnDxXPdZkrRweIVKkrRYvADYPLB9FHDTwPZNrWxKSTa01MJNExMTc9RFSdJ8M1JAleTTSR5IsqMtXx+oOyHJliT3JbkxyeEDdeanS5LGLskxwNnAfx4oXg5sG9jeBiyfbpyqqkural1VrVu9evXcdVaSNK/szhWq06pqeVueBZBkFXAtcBZwALAJuHpgH/PTJUljleTpwKeA/1hVnx2o2gGsHNheCeyoqtqb/ZMkzW97mvJ3ErC5qq6pqgeAjcDaJEe2+h/np1fVncAFwKl7+JqSJI2kZU3cALy7qj40VL2Z7ge/SWt5ZEqgJEkz2p2A6jeS3J3kL5O8sJU9Iv+8qnYCt/CTHPSR89PNTZck9ZFkaZJlwBJgSZJlrexg4M+Bi6pqqokmrgTemuTgJAcBpwNX7LWOS5IWhFEDqrcDTwMOppuC9uNJjuDR+ee07RVtfeT8dHPTJUk9nQncD5wBvK6tnwm8iW7sesfAPcA7Bva7BPg48BXgq8D1rUySpJGNNG16VX1hYPODSV4NvJRH55/Ttre3dfPTJUlzqqo20qWcT+Wdu9ivgLe1RZKkXvreQ1VAGMo/T7IfcAQ/yUE3P12SJEnSgjVjQJVk/yQvHshJfy3dszz+GLgOODrJ+pa/fjZwc1Vtabubny5JkiRpwRol5W9f4BzgSOBhYAvwyqr6OkCS9cCFwFXAF4CTB/a9hC5//Stt+/2Yny5JkiRpgZgxoKqqCeC5u6i/gS7YmqrO/HRJkiRJC9aePodKkiRJkhYtAypJkiRJ6smASpIkSZJ6MqCSJEmSpJ4MqCRJkiSpJwMqSZIkSerJgEqSJEmSejKgkiRJkqSeDKgkSZIkqScDKkmSJEnqyYBKkiRJknoyoJIkSZKkngyoJEmSJKknAypJkiRJ6smASpIkSZJ6MqCSJEmSpJ4MqCRJkiSpJwMqSZIkSerJgEqSNK8lOS3JpiQPJrliqO6EJFuS3JfkxiSHD9QlyXlJ7mnL+Umy19+AJGleM6CSJM133wHOAT4wWJhkFXAtcBZwALAJuHqgyQbglcBa4BjgRODNe6G/kqQFxIBKkjSvVdW1VfUR4J6hqpOAzVV1TVU9AGwE1iY5stWfAlxQVXdU1Z3ABcCpe6nbkqQFwoBKkrRQHQXcNLlRVTuBW1r5o+rb+lFIkrQbDKgkSQvVcmDbUNk2YMU09duA5dPdR5VkQ7tXa9PExMSsd1aSND8ZUEmSFqodwMqhspXA9mnqVwI7qqqmOlhVXVpV66pq3erVq2e9s5Kk+cmASpK0UG2mm3ACgCT7AUe08kfVt/XNSJK0GwyoJEnzWpKlSZYBS4AlSZYlWQpcBxydZH2rPxu4uaq2tF2vBN6a5OAkBwGnA1eM4S1IkuYxAypJ0nx3JnA/cAbwurZ+ZlVNAOuBc4GtwHHAyQP7XQJ8HPgK8FXg+lYmSdLIlo67A5Ik7Ymq2kg3JfpUdTcAR05TV8Db2iJJUi+7dYUqyTOSPJDkqoEyn0IvSZIkaVHa3ZS/i4AvTm74FHpJkiRJi9nIAVWSk4F7gT8bKPYp9JIkSZIWrZECqiQrgXfRzYA0yKfQS5IkSVq0Rr1C9W7gsqq6fah81p5C7xPoJUmSJM03MwZUSY4FXgS8d4rqWXsKvU+glyRJkjTfjDJt+guBNcC324Wl5XQPTnw2cDHdfVLALp9C/zdt26fQS5IkSVowRkn5u5QuSDq2LRfTPfzwxfgUekmSJEmL2IxXqKrqPuC+ye0kO4AH2hPoSbIeuBC4CvgCj34K/dPonkIP8H58Cr0kSZKkBWKUlL9HaE+kH9z2KfSSJEmSFqXdfbCvJEmSJKkxoJIkSZKkngyoJEmSJKknAypJkiRJ6smASpIkSZJ6MqCSJEmSpJ4MqCRJkiSpJwMqSZIkSerJgEqSJEmSejKgkiRJkqSeDKgkSZIkqScDKkmSJEnqyYBKkiRJknoyoJIkLWhJ1iT5ZJKtSe5KcmGSpa3uhCRbktyX5MYkh4+7v5Kk+cWASpK00L0P+AfgqcCxwPHAryZZBVwLnAUcAGwCrh5XJyVJ85MBlSRpoftfgA9X1QNVdRfwR8BRwEnA5qq6pqoeADYCa5McOb6uSpLmGwMqSdJC99+Bk5M8IcnBwEv4SVB102SjqtoJ3NLKJUkaiQGVJGmh+wxdkPR94A661L6PAMuBbUNttwErpjpIkg1JNiXZNDExMYfdlSTNJwZUkqQFK8k+wB/T3Su1H7AKeBJwHrADWDm0y0pg+1THqqpLq2pdVa1bvXr13HVakjSvGFBJkhayA4BDgQur6sGquge4HHgpsBlYO9kwyX7AEa1ckqSRGFBJkhasqrobuBX4lSRLk+wPnEJ379R1wNFJ1idZBpwN3FxVW8bXY0nSfGNAJUla6E4C/hUwAXwT+CHwf1bVBLAeOBfYChwHnDyuTkqS5qel4+6AJElzqar+FnjhNHU3AE6TLknqzStUkiRJktSTAZUkSZIk9WRAJUmSJEk9GVBJkiRJUk8GVJIkSZLU00gBVZKrknw3yfeTfCPJmwbqTkiyJcl9SW5McvhAXZKcl+SetpyfJHPxRiRJkiRpbxv1CtVvAGuqaiXwC8A5SZ6TZBVwLXAW3dPoNwFXD+y3AXgl3ZPojwFOBN48S32XJEmSpLEaKaCqqs1V9eDkZluOoHtY4uaquqaqHgA2AmuTTD7T4xTggqq6o6ruBC4ATp3F/kuSJEnS2Iz8YN8k76MLhh4PfBn4JN3T5W+abFNVO5PcAhwFbGn/vWngMDe1MkmSJO2mNWdcP+4ujNW33vOycXdBepSRJ6Woql8FVgDPp0vzexBYDmwbarqttWOK+m3A8qnuo0qyIcmmJJsmJiZGfweSJEmSNCa7NctfVT1cVZ8DDgF+BdgBrBxqthLY3taH61cCO6qqpjj2pVW1rqrWrV69ene6JUmSJElj0Xfa9KV091BtpptwAoAk+w2UM1zf1jcjSZIkSQvAjAFVkicnOTnJ8iRLkrwYeDXw58B1wNFJ1idZBpwN3FxVW9ruVwJvTXJwkoOA04Er5uSdSJIkSdJeNsqkFEWX3ncxXQB2G/CWqvooQJL1wIXAVcAXgJMH9r0EeBrwlbb9/lYmSZIkSfPejAFVVU0Ax++i/gbgyGnqCnhbWyRJkiRpQel7D5UkSZIkLXoGVJIkSZLUkwGVJEmSJPVkQCVJkiRJPRlQSZIkSVJPBlSSJEmS1JMBlSRJkiT1ZEAlSZIkST0ZUEmSFrwkJyf5WpKdSW5J8vxWfkKSLUnuS3JjksPH3VdJ0vxiQCVJWtCS/DxwHvAGYAXwAuDvk6wCrgXOAg4ANgFXj6ufkqT5aem4OyBJ0hx7J/Cuqvp8274TIMkGYHNVXdO2NwJ3JzmyqraMpaeSpHnHK1SSpAUryRJgHbA6yTeT3JHkwiSPB44CbppsW1U7gVta+VTH2pBkU5JNExMTe6P7kqR5wCtU0pA1Z1w/7i6Mzbfe87Jxd0GabQcC+wKvAp4PPAR8FDgTWA4MR0bb6NICH6WqLgUuBVi3bl3NUX8lSfOMV6gkSQvZ/e2/v1VV362qu4H/BrwU2AGsHGq/Eti+F/snSZrnDKgkSQtWVW0F7gCmuqK0GVg7uZFkP+CIVi5J0kgMqCRJC93lwH9I8uQkTwLeAnwCuA44Osn6JMuAs4GbnZBCkrQ7DKgkSQvdu4EvAt8AvgZ8GTi3qiaA9cC5wFbgOODkcXVSkjQ/OSmFJGlBq6qHgF9ty3DdDcCRe71TkqQFw4BKkiRJmgecifixyZQ/SZIkSerJgEqSJEmSejKgkiRJkqSeDKgkSZIkqScDKkmSJEnqyVn+JKlZzLMnwWN7BiVJkh6rvEIlSZIkST0ZUEmSJElSTwZUkiRJktSTAZUkSZIk9TRjQJXkcUkuS3Jbku1JvpzkJQP1JyTZkuS+JDcmOXygLknOS3JPW85Pkrl6M5IkSZK0N41yhWopcDtwPPBE4Czgw0nWJFkFXNvKDgA2AVcP7LsBeCWwFjgGOBF486z1XpIkSZLGaMZp06tqJ7BxoOgTSW4FngP8NLC5qq4BSLIRuDvJkVW1BTgFuKCq7mj1FwC/BFw8m29CkiRJksZht++hSnIg8ExgM3AUcNNkXQu+bmnlDNe39aOYQpINSTYl2TQxMbG73ZIkSZKkvW63Aqok+wK/C3ywXYFaDmwbarYNWNHWh+u3Acunuo+qqi6tqnVVtW716tW70y1JkiRJGouRA6ok+wAfAn4AnNaKdwArh5quBLZPU78S2FFV1au3kiRJkvQYMlJA1a4oXQYcCKyvqoda1Wa6CScm2+0HHNHKH1Xf1jcjSZIkSQvAqFeofhv4GeDlVXX/QPl1wNFJ1idZBpwN3NzSAQGuBN6a5OAkBwGnA1fMTtclSZIkabxGeQ7V4XRTnR8L3JVkR1teW1UTwHrgXGArcBxw8sDulwAfB74CfBW4vpVJkiRJ0rw3yrTptwHTPoy3qm4AjpymroC3tUWSJEmSFpTdnjZdkiRJktQxoJIkLQpJnpHkgSRXDZSdkGRLkvuS3NjS3CVJGpkBlSRpsbgI+OLkRpJVwLXAWcABwCbg6vF0TZI0XxlQSZIWvCQnA/cCfzZQfBKwuaquqaoHgI3A2iRT3hcsSdJUDKgkSQtakpXAu+ge3THoKOCmyY2q2gnc0solSRqJAZUkaaF7N3BZVd0+VL4c2DZUtg1YMdVBkmxIsinJpomJiTnopiRpPjKgkiQtWEmOBV4EvHeK6h3AyqGylcD2qY5VVZdW1bqqWrd69erZ7agkad6a8TlUkiTNYy8E1gDfTgLdVaklSZ4NXAycMtkwyX7AEcDmvd5LSdK85RUqSdJCdildkHRsWy4GrgdeDFwHHJ1kfZJlwNnAzVW1ZVydlSTNP16hkiQtWFV1H3Df5HaSHcADVTXRttcDFwJXAV8ATh5HPyVJ85cBlSRp0aiqjUPbNwBOky5J6s2UP0mSJEnqyYBKkiRJknoyoJIkSZKkngyoJEmSJKknAypJkiRJ6smASpIkSZJ6MqCSJEmSpJ4MqCRJkiSpJwMqSZIkSerJgEqSJEmSejKgkiRJkqSeDKgkSZIkqScDKkmSJEnqyYBKkiRJknoyoJIkSZKkngyoJEmSJKknAypJkiRJ6mmkgCrJaUk2JXkwyRVDdSck2ZLkviQ3Jjl8oC5JzktyT1vOT5JZfg+SJEmSNBajXqH6DnAO8IHBwiSrgGuBs4ADgE3A1QNNNgCvBNYCxwAnAm/esy5LkiRJ0mPDSAFVVV1bVR8B7hmqOgnYXFXXVNUDwEZgbZIjW/0pwAVVdUdV3QlcAJw6Kz2XJEmSpDHb03uojgJumtyoqp3ALa38UfVt/SgkSZIkaQHY04BqObBtqGwbsGKa+m3A8qnuo0qyod2ntWliYmIPuyVJkiRJc29PA6odwMqhspXA9mnqVwI7qqqGD1RVl1bVuqpat3r16j3sliRJkiTNvT0NqDbTTTgBQJL9gCNa+aPq2/pmJEnaC3UuYUMAAAWtSURBVJI8LsllSW5Lsj3Jl5O8ZKB+2plqJUkaxajTpi9NsgxYAixJsizJUuA64Ogk61v92cDNVbWl7Xol8NYkByc5CDgduGLW34UkSVNbCtwOHA88kW5W2g8nWTPCTLWSJM1o1CtUZwL3A2cAr2vrZ1bVBLAeOBfYChwHnDyw3yXAx4GvAF8Frm9lkiTNuaraWVUbq+pbVfWjqvoEcCvwHGaeqVaSpBktHaVRVW2kG2imqrsBmHLwafdKva0tkiSNVZIDgWfSpZ//CkMz1SaZnKl2y9RHkCTpkfb0HipJkuaFJPsCvwt8sKWmzzRT7fD+zkYrSXoUAypJ0oKXZB/gQ8APgNNa8Uwz1T6Cs9FKkqZiQCVJWtDasw8vAw4E1lfVQ61qpplqJUmakQGVJGmh+23gZ4CXV9X9A+UzzVQrSdKMDKgkSQtWe67Um4FjgbuS7GjLa0eYqVaSpBmNNMufJEnzUVXdBmQX9dPOVCtJ0ii8QiVJkiRJPRlQSZIkSVJPBlSSJEmS1JMBlSRJkiT1ZEAlSZIkST0ZUEmSJElSTwZUkiRJktSTAZUkSZIk9WRAJUmSJEk9GVBJkiRJUk8GVJIkSZLUkwGVJEmSJPVkQCVJkiRJPRlQSZIkSVJPBlSSJEmS1JMBlSRJkiT1ZEAlSZIkST0ZUEmSJElSTwZUkiRJktSTAZUkSZIk9WRAJUmSJEk9GVBJkiRJUk8GVJIkSZLU05wHVEkOSHJdkp1Jbkvymrl+TUmSRuU4JUnaE0v3wmtcBPwAOBA4Frg+yU1VtXkvvLYkSTNxnJIk9TanV6iS7AesB86qqh1V9TngY8Dr5/J1JUkaheOUJGlPparm7uDJzwJ/VVWPHyj7T8DxVfXyobYbgA1t81nA1+esY499q4C7x90JjYWf/eK22D//w6tq9d58QcepXhb7v9PFzs9/8Vrsn/20Y9Rcp/wtB7YNlW0DVgw3rKpLgUvnuD/zQpJNVbVu3P3Q3udnv7j5+Y+F49Ru8t/p4ubnv3j52U9vriel2AGsHCpbCWyf49eVJGkUjlOSpD0y1wHVN4ClSZ4xULYW8EZfSdJjgeOUJGmPzGlAVVU7gWuBdyXZL8n/BrwC+NBcvu4CsOhTShYxP/vFzc9/L3Oc6sV/p4ubn//i5Wc/jTmdlAK653sAHwB+HrgHOKOqfm9OX1SSpBE5TkmS9sScB1SSJEmStFDN9T1UkiRJkrRgGVBJkiRJUk8GVGOW5GeTvCrJE5IsSXJakvcmOXHcfZM0d5IcluRfJ3nmFHWvHkefpKk4TkmLk+PU6AyoxijJvwM+Cfwm8BfA24Gj6B40+ftJ3jjG7mmM2h8tZ4+7H5obSf4V8FVgI/C3Sd6XZMlAk0vG0jFpiOOUpuM4tbA5Tu0eJ6UYoyRbgF8AAnwN+OdV9Vet7sXA+VW1doxd1JgkeRxwX1UtmbGx5p0kXwLOrqrrkxwIXAU8CJxUVT9Isr2qVoy3l5LjlKbnOLWwOU7tHgOqMUqyraqe2NZ3AsurfSBJ9gH+sar2H2cfNXeSfGAX1UuB1zpQLUyD/++37aV0g9Uquj9ev+dApccCx6nFzXFq8XKc2j2m/I3XziT7tvUr6pHR7eOBH42hT9p7XgPcD9w5xXLHGPulubc1yaGTG1X1Q+DVwLeBGwD/QNFjhePU4uY4tXg5Tu2GpePuwCL3Z8DTga9V1b8fqjsRuHnvd0l70VeAP66qjw1XJFkGnLH3u6S95AbgDcC7JgvaH6pvTHIx8LxxdUwa4ji1uDlOLV6OU7vBlL/HqCSr6f7t3j3uvmhuJPn3wJ1V9ZEp6pYAZ1bVO/d+zzTXkvwUsLSq7pum/rCq+vZe7pa0WxynFj7HqcXLcWr3GFBJkiRJUk/eQyVJkiRJPRlQSZIkSVJPBlSSJEmS1JMBlSRJkiT1ZEAlSZIkST39T3dYNG7egGcIAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 864x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Check the class balance\n",
    "train_pclass_value_counts = df_train.pclass.value_counts()\n",
    "test_pclass_value_counts = df_test.pclass.value_counts()\n",
    "\n",
    "plt.figure(figsize=(12, 4))\n",
    "\n",
    "plt.subplot(121)\n",
    "plt.title('Train set: passenger class')\n",
    "train_pclass_value_counts.plot.bar()\n",
    "\n",
    "plt.subplot(122)\n",
    "plt.title('Test set: passenger class')\n",
    "test_pclass_value_counts.plot.bar()\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the above diagnostics, we are satisfied that, at least in these few categories, the train and test are similar enough, and we can move forward.\n",
    "\n",
    "## Feature engineering\n",
    "\n",
    "In this section we will use `vaex` to create meaningful features that will be used to train a classification model. To start with, let's get a high level overview of the training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.527108Z",
     "start_time": "2020-05-01T17:12:38.408602Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pclass</th>\n",
       "      <th>survived</th>\n",
       "      <th>name</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>ticket</th>\n",
       "      <th>fare</th>\n",
       "      <th>cabin</th>\n",
       "      <th>embarked</th>\n",
       "      <th>boat</th>\n",
       "      <th>body</th>\n",
       "      <th>home_dest</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>dtype</th>\n",
       "      <td>int64</td>\n",
       "      <td>bool</td>\n",
       "      <td>str</td>\n",
       "      <td>str</td>\n",
       "      <td>float64</td>\n",
       "      <td>int64</td>\n",
       "      <td>int64</td>\n",
       "      <td>str</td>\n",
       "      <td>float64</td>\n",
       "      <td>str</td>\n",
       "      <td>str</td>\n",
       "      <td>str</td>\n",
       "      <td>float64</td>\n",
       "      <td>str</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>841</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1046</td>\n",
       "      <td>233</td>\n",
       "      <td>1046</td>\n",
       "      <td>380</td>\n",
       "      <td>102</td>\n",
       "      <td>592</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NA</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>206</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>814</td>\n",
       "      <td>1</td>\n",
       "      <td>667</td>\n",
       "      <td>945</td>\n",
       "      <td>455</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>2.3075453677172875</td>\n",
       "      <td>0.3744030563514804</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>29.565299286563608</td>\n",
       "      <td>0.5100286532951289</td>\n",
       "      <td>0.3982808022922636</td>\n",
       "      <td>--</td>\n",
       "      <td>32.926091013384294</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>159.6764705882353</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>0.833269</td>\n",
       "      <td>0.483968</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>14.162</td>\n",
       "      <td>1.07131</td>\n",
       "      <td>0.890852</td>\n",
       "      <td>--</td>\n",
       "      <td>50.6783</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>96.2208</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>0.1667</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>--</td>\n",
       "      <td>0</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>1</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>3</td>\n",
       "      <td>True</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>80</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>--</td>\n",
       "      <td>512.329</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>327</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   pclass            survived  name   sex                 age  \\\n",
       "dtype               int64                bool   str   str             float64   \n",
       "count                1047                1047  1047  1047                 841   \n",
       "NA                      0                   0     0     0                 206   \n",
       "mean   2.3075453677172875  0.3744030563514804    --    --  29.565299286563608   \n",
       "std              0.833269            0.483968    --    --              14.162   \n",
       "min                     1               False    --    --              0.1667   \n",
       "max                     3                True    --    --                  80   \n",
       "\n",
       "                    sibsp               parch ticket                fare  \\\n",
       "dtype               int64               int64    str             float64   \n",
       "count                1047                1047   1047                1046   \n",
       "NA                      0                   0      0                   1   \n",
       "mean   0.5100286532951289  0.3982808022922636     --  32.926091013384294   \n",
       "std               1.07131            0.890852     --             50.6783   \n",
       "min                     0                   0     --                   0   \n",
       "max                     8                   9     --             512.329   \n",
       "\n",
       "      cabin embarked boat               body home_dest  \n",
       "dtype   str      str  str            float64       str  \n",
       "count   233     1046  380                102       592  \n",
       "NA      814        1  667                945       455  \n",
       "mean     --       --   --  159.6764705882353        --  \n",
       "std      --       --   --            96.2208        --  \n",
       "min      --       --   --                  1        --  \n",
       "max      --       --   --                327        --  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Imputing\n",
    "\n",
    "We notice that there are 3 columns that have missing data, so our first task will be to impute the missing values with suitable substitutes. This is our strategy:\n",
    "\n",
    "- age: impute with the median age value\n",
    "- fare: impute with the mean fare of the 5 most common values.\n",
    "- cabin: impute with \"M\" for \"Missing\"\n",
    "- Embarked: Impute with with the most common value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.546371Z",
     "start_time": "2020-05-01T17:12:38.529144Z"
    }
   },
   "outputs": [],
   "source": [
    "# Handle missing values\n",
    "\n",
    "# Age - just do the mean of the training set for now\n",
    "median_age = df_train.percentile_approx(expression='age', percentage=50.0)\n",
    "df_train['age'] = df_train.age.fillna(value=median_age)\n",
    "\n",
    "# Fare: the mean of the 5 most common ticket prices.\n",
    "fill_fares = df_train.fare.value_counts(dropna=True)\n",
    "fill_fare = fill_fares.iloc[:5].index.values.mean()\n",
    "df_train['fare'] = df_train.fare.fillna(value=fill_fare)\n",
    "\n",
    "# Cabing: this is a string column so let's mark it as \"M\" for \"Missing\"\n",
    "df_train['cabin'] = df_train.cabin.fillna(value='M')\n",
    "\n",
    "# Embarked: Similar as for Cabin, let's mark the missing values with \"U\" for unknown\n",
    "fill_embarked = df_train.embarked.value_counts(dropna=True).index[0]\n",
    "df_train['embarked'] = df_train.embarked.fillna(value=fill_embarked)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### String processing\n",
    "\n",
    "Next up, let's engineer some new, more meaningful features out of the \"raw\" data that is present in the dataset. \n",
    "Starting with the name of the passengers, we are going to extract the titles, as well as we are going to count the number of words a name contains. These features can be a loose proxy to the age and status of the passengers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.587351Z",
     "start_time": "2020-05-01T17:12:38.548452Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Expression = name_title\n",
       "Length: 1,047 dtype: str (column)\n",
       "---------------------------------\n",
       "   0      Mr\n",
       "   1      Mr\n",
       "   2     Mrs\n",
       "   3    Miss\n",
       "   4      Mr\n",
       "    ...     \n",
       "1042  Master\n",
       "1043     Mrs\n",
       "1044  Master\n",
       "1045      Mr\n",
       "1046      Mr"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = name_num_words\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  3\n",
       "   1  4\n",
       "   2  5\n",
       "   3  4\n",
       "   4  4\n",
       "  ...  \n",
       "1042  4\n",
       "1043  6\n",
       "1044  4\n",
       "1045  4\n",
       "1046  3"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Engineer features from the names\n",
    "\n",
    "# Titles\n",
    "df_train['name_title'] = df_train['name'].str.replace('.* ([A-Z][a-z]+)\\..*', \"\\\\1\", regex=True)\n",
    "display(df_train['name_title'])\n",
    "\n",
    "# Number of words in the name\n",
    "df_train['name_num_words'] = df_train['name'].str.count(\"[ ]+\", regex=True) + 1\n",
    "display(df_train['name_num_words'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the cabin colum, we will engineer 3 features:\n",
    " - \"deck\": extacting the deck on which the cabin is located, which is encoded in each cabin value;\n",
    " - \"multi_cabin: a boolean feature indicating whether a passenger is allocated more than one cabin\n",
    " - \"has_cabin\": since there were plenty of values in the original cabin column that had missing values, we are just going to build a feature which tells us whether a passenger had an assigned cabin or not."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.747634Z",
     "start_time": "2020-05-01T17:12:38.594540Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Expression = deck\n",
       "Length: 1,047 dtype: str (column)\n",
       "---------------------------------\n",
       "   0  M\n",
       "   1  B\n",
       "   2  M\n",
       "   3  M\n",
       "   4  M\n",
       "  ...  \n",
       "1042  M\n",
       "1043  M\n",
       "1044  M\n",
       "1045  B\n",
       "1046  M"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = multi_cabin\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  0\n",
       "   1  0\n",
       "   2  0\n",
       "   3  0\n",
       "   4  0\n",
       "  ...  \n",
       "1042  0\n",
       "1043  0\n",
       "1044  0\n",
       "1045  1\n",
       "1046  0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = has_cabin\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  1\n",
       "   1  1\n",
       "   2  1\n",
       "   3  1\n",
       "   4  1\n",
       "  ...  \n",
       "1042  1\n",
       "1043  1\n",
       "1044  1\n",
       "1045  1\n",
       "1046  1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#  Extract the deck\n",
    "df_train['deck'] = df_train.cabin.str.slice(start=0, stop=1)\n",
    "display(df_train['deck'])\n",
    "\n",
    "# Passengers under which name have several rooms booked, these are all for 1st class passengers\n",
    "df_train['multi_cabin'] = ((df_train.cabin.str.count(pat='[A-Z]', regex=True) > 1) &\\\n",
    "                           ~(df_train.deck == 'F')).astype('int')\n",
    "display(df_train['multi_cabin'])\n",
    "\n",
    "# Out of these, cabin has the most missing values, so let's create a feature tracking if a passenger had a cabin\n",
    "df_train['has_cabin'] = df_train.cabin.notna().astype('int')\n",
    "display(df_train['has_cabin'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### More features\n",
    "\n",
    "There are two features that give an indication whether a passenger is travelling alone, or with a famly. \n",
    "These are the \"sibsp\" and \"parch\" columns that tell us the number of siblinds or spouses and the number of parents or children each passenger has on-board respectively. We are going to use this information to build two columns:\n",
    " - \"family_size\" the size of the family of each passenger;\n",
    " - \"is_alone\" an additional boolean feature which indicates whether a passenger is traveling without their family. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.813132Z",
     "start_time": "2020-05-01T17:12:38.750219Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Expression = family_size\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  1\n",
       "   1  1\n",
       "   2  3\n",
       "   3  4\n",
       "   4  1\n",
       "  ...  \n",
       "1042  8\n",
       "1043  2\n",
       "1044  3\n",
       "1045  2\n",
       "1046  1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = is_alone\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  0\n",
       "   1  0\n",
       "   2  0\n",
       "   3  0\n",
       "   4  0\n",
       "  ...  \n",
       "1042  0\n",
       "1043  0\n",
       "1044  0\n",
       "1045  0\n",
       "1046  0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Size of family that are on board: passenger + number of siblings, spouses, parents, children. \n",
    "df_train['family_size'] = (df_train.sibsp + df_train.parch + 1)\n",
    "display(df_train['family_size'])\n",
    "\n",
    "# Whether or not a passenger is alone\n",
    "df_train['is_alone'] = (df_train.family_size == 0).astype('int')\n",
    "display(df_train['is_alone'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's create two new features:\n",
    " - age $\\times$  class\n",
    " - fare per family member, i.e. fare $/$ family_size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.831478Z",
     "start_time": "2020-05-01T17:12:38.823592Z"
    }
   },
   "outputs": [],
   "source": [
    "# Create new features\n",
    "df_train['age_times_class'] = df_train.age * df_train.pclass\n",
    "\n",
    "# fare per person in the family\n",
    "df_train['fare_per_family_member'] = df_train.fare / df_train.family_size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modeling (part 1): gradient boosted trees\n",
    "\n",
    "Since this dataset contains a lot of categorical features, we will start with a tree based model. This we will gear the following feature pre-processing towards the use of tree-based models.\n",
    "\n",
    "### Feature pre-processing for boosted tree models\n",
    "\n",
    "The features \"sex\", \"embarked\", and \"deck\" can be simply label encoded. The feature \"name_tite\" contains certain a larger degree of cardinality, relative to the size of the training set, and in this case we will use the Frequency Encoder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:38.983682Z",
     "start_time": "2020-05-01T17:12:38.833258Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:MainThread:numexpr.utils:NumExpr defaulting to 4 threads.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>pclass  </th><th>survived  </th><th>name                                          </th><th>sex   </th><th>age  </th><th>sibsp  </th><th>parch  </th><th>ticket   </th><th>fare    </th><th>cabin  </th><th>embarked  </th><th>boat  </th><th>body  </th><th>home_dest                           </th><th>name_title  </th><th>name_num_words  </th><th>deck  </th><th>multi_cabin  </th><th>has_cabin  </th><th>family_size  </th><th>is_alone  </th><th>age_times_class  </th><th>fare_per_family_member  </th><th>label_encoded_sex  </th><th>label_encoded_embarked  </th><th>label_encoded_deck  </th><th>frequency_encoded_name_title  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>3       </td><td>False     </td><td>Stoytcheff, Mr. Ilia                          </td><td>male  </td><td>19.0 </td><td>0      </td><td>0      </td><td>349205   </td><td>7.8958  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>57.0             </td><td>7.8958                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>1       </td><td>False     </td><td>Payne, Mr. Vivian Ponsonby                    </td><td>male  </td><td>23.0 </td><td>0      </td><td>0      </td><td>12749    </td><td>93.5    </td><td>B24    </td><td>S         </td><td>None  </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>23.0             </td><td>93.5                    </td><td>0                  </td><td>0                       </td><td>1                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>3       </td><td>True      </td><td>Abbott, Mrs. Stanton (Rosa Hunt)              </td><td>female</td><td>35.0 </td><td>1      </td><td>1      </td><td>C.A. 2673</td><td>20.25   </td><td>M      </td><td>S         </td><td>A     </td><td>nan   </td><td>East Providence, RI                 </td><td>Mrs         </td><td>5               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>105.0            </td><td>6.75                    </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.1451766953199618            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>2       </td><td>True      </td><td>Hocking, Miss. Ellen \"Nellie\"                 </td><td>female</td><td>20.0 </td><td>2      </td><td>1      </td><td>29105    </td><td>23.0    </td><td>M      </td><td>S         </td><td>4     </td><td>nan   </td><td>Cornwall / Akron, OH                </td><td>Miss        </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>4            </td><td>0         </td><td>40.0             </td><td>5.75                    </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.20152817574021012           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>3       </td><td>False     </td><td>Nilsson, Mr. August Ferdinand                 </td><td>male  </td><td>21.0 </td><td>0      </td><td>0      </td><td>350410   </td><td>7.8542  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>63.0             </td><td>7.8542                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td>...                              </td><td>...     </td><td>...       </td><td>...                                           </td><td>...   </td><td>...  </td><td>...    </td><td>...    </td><td>...      </td><td>...     </td><td>...    </td><td>...       </td><td>...   </td><td>...   </td><td>...                                 </td><td>...         </td><td>...             </td><td>...   </td><td>...          </td><td>...        </td><td>...          </td><td>...       </td><td>...              </td><td>...                     </td><td>...                </td><td>...                     </td><td>...                 </td><td>...                           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,042</i></td><td>3       </td><td>False     </td><td>Goodwin, Master. Sidney Leonard               </td><td>male  </td><td>1.0  </td><td>5      </td><td>2      </td><td>CA 2144  </td><td>46.9    </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>Wiltshire, England Niagara Falls, NY</td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>8            </td><td>0         </td><td>3.0              </td><td>5.8625                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.045845272206303724          </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,043</i></td><td>3       </td><td>False     </td><td>Ahlin, Mrs. Johan (Johanna Persdotter Larsson)</td><td>female</td><td>40.0 </td><td>1      </td><td>0      </td><td>7546     </td><td>9.475   </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>Sweden Akeley, MN                   </td><td>Mrs         </td><td>6               </td><td>M     </td><td>0            </td><td>1          </td><td>2            </td><td>0         </td><td>120.0            </td><td>4.7375                  </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.1451766953199618            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,044</i></td><td>3       </td><td>True      </td><td>Johnson, Master. Harold Theodor               </td><td>male  </td><td>4.0  </td><td>1      </td><td>1      </td><td>347742   </td><td>11.1333 </td><td>M      </td><td>S         </td><td>15    </td><td>nan   </td><td>None                                </td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>12.0             </td><td>3.7111                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.045845272206303724          </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,045</i></td><td>1       </td><td>False     </td><td>Baxter, Mr. Quigg Edmond                      </td><td>male  </td><td>24.0 </td><td>0      </td><td>1      </td><td>PC 17558 </td><td>247.5208</td><td>B58 B60</td><td>C         </td><td>None  </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>1            </td><td>1          </td><td>2            </td><td>0         </td><td>24.0             </td><td>123.7604                </td><td>0                  </td><td>2                       </td><td>1                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,046</i></td><td>3       </td><td>False     </td><td>Coleff, Mr. Satio                             </td><td>male  </td><td>24.0 </td><td>0      </td><td>0      </td><td>349209   </td><td>7.4958  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>72.0             </td><td>7.4958                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#      pclass    survived    name                                            sex     age    sibsp    parch    ticket     fare      cabin    embarked    boat    body    home_dest                             name_title    name_num_words    deck    multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title\n",
       "0      3         False       Stoytcheff, Mr. Ilia                            male    19.0   0        0        349205     7.8958    M        S           None    nan     None                                  Mr            3                 M       0              1            1              0           57.0               7.8958                    0                    0                         0                     0.5787965616045845\n",
       "1      1         False       Payne, Mr. Vivian Ponsonby                      male    23.0   0        0        12749      93.5      B24      S           None    nan     Montreal, PQ                          Mr            4                 B       0              1            1              0           23.0               93.5                      0                    0                         1                     0.5787965616045845\n",
       "2      3         True        Abbott, Mrs. Stanton (Rosa Hunt)                female  35.0   1        1        C.A. 2673  20.25     M        S           A       nan     East Providence, RI                   Mrs           5                 M       0              1            3              0           105.0              6.75                      1                    0                         0                     0.1451766953199618\n",
       "3      2         True        Hocking, Miss. Ellen \"Nellie\"                   female  20.0   2        1        29105      23.0      M        S           4       nan     Cornwall / Akron, OH                  Miss          4                 M       0              1            4              0           40.0               5.75                      1                    0                         0                     0.20152817574021012\n",
       "4      3         False       Nilsson, Mr. August Ferdinand                   male    21.0   0        0        350410     7.8542    M        S           None    nan     None                                  Mr            4                 M       0              1            1              0           63.0               7.8542                    0                    0                         0                     0.5787965616045845\n",
       "...    ...       ...         ...                                             ...     ...    ...      ...      ...        ...       ...      ...         ...     ...     ...                                   ...           ...               ...     ...            ...          ...            ...         ...                ...                       ...                  ...                       ...                   ...\n",
       "1,042  3         False       Goodwin, Master. Sidney Leonard                 male    1.0    5        2        CA 2144    46.9      M        S           None    nan     Wiltshire, England Niagara Falls, NY  Master        4                 M       0              1            8              0           3.0                5.8625                    0                    0                         0                     0.045845272206303724\n",
       "1,043  3         False       Ahlin, Mrs. Johan (Johanna Persdotter Larsson)  female  40.0   1        0        7546       9.475     M        S           None    nan     Sweden Akeley, MN                     Mrs           6                 M       0              1            2              0           120.0              4.7375                    1                    0                         0                     0.1451766953199618\n",
       "1,044  3         True        Johnson, Master. Harold Theodor                 male    4.0    1        1        347742     11.1333   M        S           15      nan     None                                  Master        4                 M       0              1            3              0           12.0               3.7111                    0                    0                         0                     0.045845272206303724\n",
       "1,045  1         False       Baxter, Mr. Quigg Edmond                        male    24.0   0        1        PC 17558   247.5208  B58 B60  C           None    nan     Montreal, PQ                          Mr            4                 B       1              1            2              0           24.0               123.7604                  0                    2                         1                     0.5787965616045845\n",
       "1,046  3         False       Coleff, Mr. Satio                               male    24.0   0        0        349209     7.4958    M        S           None    nan     None                                  Mr            3                 M       0              1            1              0           72.0               7.4958                    0                    0                         0                     0.5787965616045845"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "label_encoder = vaex.ml.LabelEncoder(features=['sex', 'embarked', 'deck'], allow_unseen=True)\n",
    "df_train = label_encoder.fit_transform(df_train)\n",
    "\n",
    "# While doing a transform, previously unseen values will be encoded as \"zero\".\n",
    "frequency_encoder = vaex.ml.FrequencyEncoder(features=['name_title'], unseen='zero')\n",
    "df_train = frequency_encoder.fit_transform(df_train)\n",
    "df_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once all the categorical data is encoded, we can select the features we are going to use for training the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:39.052837Z",
     "start_time": "2020-05-01T17:12:38.986328Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  name_num_words</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  age</th><th style=\"text-align: right;\">   fare</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               3</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">               57</td><td style=\"text-align: right;\">                  7.8958</td><td style=\"text-align: right;\">   19</td><td style=\"text-align: right;\"> 7.8958</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   1</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               4</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">               23</td><td style=\"text-align: right;\">                 93.5   </td><td style=\"text-align: right;\">   23</td><td style=\"text-align: right;\">93.5   </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               5</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            3</td><td style=\"text-align: right;\">              105</td><td style=\"text-align: right;\">                  6.75  </td><td style=\"text-align: right;\">   35</td><td style=\"text-align: right;\">20.25  </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.201528</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               4</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            4</td><td style=\"text-align: right;\">               40</td><td style=\"text-align: right;\">                  5.75  </td><td style=\"text-align: right;\">   20</td><td style=\"text-align: right;\">23     </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               4</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">               63</td><td style=\"text-align: right;\">                  7.8542</td><td style=\"text-align: right;\">   21</td><td style=\"text-align: right;\"> 7.8542</td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title    multi_cabin    name_num_words    has_cabin    is_alone    family_size    age_times_class    fare_per_family_member    age     fare\n",
       "  0                    0                         0                     0                        0.578797              0                 3            1           0              1                 57                    7.8958     19   7.8958\n",
       "  1                    0                         0                     1                        0.578797              0                 4            1           0              1                 23                   93.5        23  93.5\n",
       "  2                    1                         0                     0                        0.145177              0                 5            1           0              3                105                    6.75       35  20.25\n",
       "  3                    1                         0                     0                        0.201528              0                 4            1           0              4                 40                    5.75       20  23\n",
       "  4                    0                         0                     0                        0.578797              0                 4            1           0              1                 63                    7.8542     21   7.8542"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# features to use for the trainin of the boosting model\n",
    "encoded_features = df_train.get_column_names(regex='^freque|^label')\n",
    "features = encoded_features + ['multi_cabin', 'name_num_words', \n",
    "                               'has_cabin', 'is_alone', \n",
    "                               'family_size', 'age_times_class',\n",
    "                               'fare_per_family_member',\n",
    "                               'age', 'fare']\n",
    "\n",
    "# Preview the feature matrix\n",
    "df_train[features].head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Estimator: [xgboost](https://xgboost.readthedocs.io/en/latest/)\n",
    "\n",
    "Now let's feed this data into an a tree based estimator. In this example we will use [xgboost](https://xgboost.readthedocs.io/en/latest/). In principle, any algorithm that follows the [scikit-learn](https://scikit-learn.org/stable/) API convention, i.e. it contains the `.fit`, `.predict` methods is compatable with `vaex`. However, the data will be materialized, i.e. will be read into memory before it is passed on to the estimators. We are hard at work trying to make at least some of the estimators from [scikit-learn](https://scikit-learn.org/stable/) run out-of-core!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:40.968831Z",
     "start_time": "2020-05-01T17:12:39.055474Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>pclass  </th><th>survived  </th><th>name                                          </th><th>sex   </th><th>age  </th><th>sibsp  </th><th>parch  </th><th>ticket   </th><th>fare    </th><th>cabin  </th><th>embarked  </th><th>boat  </th><th>body  </th><th>home_dest                           </th><th>name_title  </th><th>name_num_words  </th><th>deck  </th><th>multi_cabin  </th><th>has_cabin  </th><th>family_size  </th><th>is_alone  </th><th>age_times_class  </th><th>fare_per_family_member  </th><th>label_encoded_sex  </th><th>label_encoded_embarked  </th><th>label_encoded_deck  </th><th>frequency_encoded_name_title  </th><th>prediction_xgb  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>3       </td><td>False     </td><td>Stoytcheff, Mr. Ilia                          </td><td>male  </td><td>19.0 </td><td>0      </td><td>0      </td><td>349205   </td><td>7.8958  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>57.0             </td><td>7.8958                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>1       </td><td>False     </td><td>Payne, Mr. Vivian Ponsonby                    </td><td>male  </td><td>23.0 </td><td>0      </td><td>0      </td><td>12749    </td><td>93.5    </td><td>B24    </td><td>S         </td><td>None  </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>23.0             </td><td>93.5                    </td><td>0                  </td><td>0                       </td><td>1                   </td><td>0.5787965616045845            </td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>3       </td><td>True      </td><td>Abbott, Mrs. Stanton (Rosa Hunt)              </td><td>female</td><td>35.0 </td><td>1      </td><td>1      </td><td>C.A. 2673</td><td>20.25   </td><td>M      </td><td>S         </td><td>A     </td><td>nan   </td><td>East Providence, RI                 </td><td>Mrs         </td><td>5               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>105.0            </td><td>6.75                    </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.1451766953199618            </td><td>True            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>2       </td><td>True      </td><td>Hocking, Miss. Ellen \"Nellie\"                 </td><td>female</td><td>20.0 </td><td>2      </td><td>1      </td><td>29105    </td><td>23.0    </td><td>M      </td><td>S         </td><td>4     </td><td>nan   </td><td>Cornwall / Akron, OH                </td><td>Miss        </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>4            </td><td>0         </td><td>40.0             </td><td>5.75                    </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.20152817574021012           </td><td>True            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>3       </td><td>False     </td><td>Nilsson, Mr. August Ferdinand                 </td><td>male  </td><td>21.0 </td><td>0      </td><td>0      </td><td>350410   </td><td>7.8542  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>63.0             </td><td>7.8542                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td><td>False           </td></tr>\n",
       "<tr><td>...                              </td><td>...     </td><td>...       </td><td>...                                           </td><td>...   </td><td>...  </td><td>...    </td><td>...    </td><td>...      </td><td>...     </td><td>...    </td><td>...       </td><td>...   </td><td>...   </td><td>...                                 </td><td>...         </td><td>...             </td><td>...   </td><td>...          </td><td>...        </td><td>...          </td><td>...       </td><td>...              </td><td>...                     </td><td>...                </td><td>...                     </td><td>...                 </td><td>...                           </td><td>...             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,042</i></td><td>3       </td><td>False     </td><td>Goodwin, Master. Sidney Leonard               </td><td>male  </td><td>1.0  </td><td>5      </td><td>2      </td><td>CA 2144  </td><td>46.9    </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>Wiltshire, England Niagara Falls, NY</td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>8            </td><td>0         </td><td>3.0              </td><td>5.8625                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.045845272206303724          </td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,043</i></td><td>3       </td><td>False     </td><td>Ahlin, Mrs. Johan (Johanna Persdotter Larsson)</td><td>female</td><td>40.0 </td><td>1      </td><td>0      </td><td>7546     </td><td>9.475   </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>Sweden Akeley, MN                   </td><td>Mrs         </td><td>6               </td><td>M     </td><td>0            </td><td>1          </td><td>2            </td><td>0         </td><td>120.0            </td><td>4.7375                  </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.1451766953199618            </td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,044</i></td><td>3       </td><td>True      </td><td>Johnson, Master. Harold Theodor               </td><td>male  </td><td>4.0  </td><td>1      </td><td>1      </td><td>347742   </td><td>11.1333 </td><td>M      </td><td>S         </td><td>15    </td><td>nan   </td><td>None                                </td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>12.0             </td><td>3.7111                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.045845272206303724          </td><td>True            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,045</i></td><td>1       </td><td>False     </td><td>Baxter, Mr. Quigg Edmond                      </td><td>male  </td><td>24.0 </td><td>0      </td><td>1      </td><td>PC 17558 </td><td>247.5208</td><td>B58 B60</td><td>C         </td><td>None  </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>1            </td><td>1          </td><td>2            </td><td>0         </td><td>24.0             </td><td>123.7604                </td><td>0                  </td><td>2                       </td><td>1                   </td><td>0.5787965616045845            </td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,046</i></td><td>3       </td><td>False     </td><td>Coleff, Mr. Satio                             </td><td>male  </td><td>24.0 </td><td>0      </td><td>0      </td><td>349209   </td><td>7.4958  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>72.0             </td><td>7.4958                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td><td>False           </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#      pclass    survived    name                                            sex     age    sibsp    parch    ticket     fare      cabin    embarked    boat    body    home_dest                             name_title    name_num_words    deck    multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title    prediction_xgb\n",
       "0      3         False       Stoytcheff, Mr. Ilia                            male    19.0   0        0        349205     7.8958    M        S           None    nan     None                                  Mr            3                 M       0              1            1              0           57.0               7.8958                    0                    0                         0                     0.5787965616045845              False\n",
       "1      1         False       Payne, Mr. Vivian Ponsonby                      male    23.0   0        0        12749      93.5      B24      S           None    nan     Montreal, PQ                          Mr            4                 B       0              1            1              0           23.0               93.5                      0                    0                         1                     0.5787965616045845              False\n",
       "2      3         True        Abbott, Mrs. Stanton (Rosa Hunt)                female  35.0   1        1        C.A. 2673  20.25     M        S           A       nan     East Providence, RI                   Mrs           5                 M       0              1            3              0           105.0              6.75                      1                    0                         0                     0.1451766953199618              True\n",
       "3      2         True        Hocking, Miss. Ellen \"Nellie\"                   female  20.0   2        1        29105      23.0      M        S           4       nan     Cornwall / Akron, OH                  Miss          4                 M       0              1            4              0           40.0               5.75                      1                    0                         0                     0.20152817574021012             True\n",
       "4      3         False       Nilsson, Mr. August Ferdinand                   male    21.0   0        0        350410     7.8542    M        S           None    nan     None                                  Mr            4                 M       0              1            1              0           63.0               7.8542                    0                    0                         0                     0.5787965616045845              False\n",
       "...    ...       ...         ...                                             ...     ...    ...      ...      ...        ...       ...      ...         ...     ...     ...                                   ...           ...               ...     ...            ...          ...            ...         ...                ...                       ...                  ...                       ...                   ...                             ...\n",
       "1,042  3         False       Goodwin, Master. Sidney Leonard                 male    1.0    5        2        CA 2144    46.9      M        S           None    nan     Wiltshire, England Niagara Falls, NY  Master        4                 M       0              1            8              0           3.0                5.8625                    0                    0                         0                     0.045845272206303724            False\n",
       "1,043  3         False       Ahlin, Mrs. Johan (Johanna Persdotter Larsson)  female  40.0   1        0        7546       9.475     M        S           None    nan     Sweden Akeley, MN                     Mrs           6                 M       0              1            2              0           120.0              4.7375                    1                    0                         0                     0.1451766953199618              False\n",
       "1,044  3         True        Johnson, Master. Harold Theodor                 male    4.0    1        1        347742     11.1333   M        S           15      nan     None                                  Master        4                 M       0              1            3              0           12.0               3.7111                    0                    0                         0                     0.045845272206303724            True\n",
       "1,045  1         False       Baxter, Mr. Quigg Edmond                        male    24.0   0        1        PC 17558   247.5208  B58 B60  C           None    nan     Montreal, PQ                          Mr            4                 B       1              1            2              0           24.0               123.7604                  0                    2                         1                     0.5787965616045845              False\n",
       "1,046  3         False       Coleff, Mr. Satio                               male    24.0   0        0        349209     7.4958    M        S           None    nan     None                                  Mr            3                 M       0              1            1              0           72.0               7.4958                    0                    0                         0                     0.5787965616045845              False"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import xgboost\n",
    "import vaex.ml.sklearn\n",
    "\n",
    "# Instantiate the xgboost model normally, using the scikit-learn API\n",
    "xgb_model = xgboost.sklearn.XGBClassifier(max_depth=11,\n",
    "                                          learning_rate=0.1, \n",
    "                                          n_estimators=500, \n",
    "                                          subsample=0.75, \n",
    "                                          colsample_bylevel=1, \n",
    "                                          colsample_bytree=1,\n",
    "                                          scale_pos_weight=1.5,\n",
    "                                          reg_lambda=1.5, \n",
    "                                          reg_alpha=5, \n",
    "                                          n_jobs=-1,\n",
    "                                          random_state=42,\n",
    "                                          verbosity=0)\n",
    "\n",
    "# Make it work with vaex (for the automagic pipeline and lazy predictions)\n",
    "vaex_xgb_model = vaex.ml.sklearn.Predictor(features=features,\n",
    "                                           target='survived',\n",
    "                                           model=xgb_model, \n",
    "                                           prediction_name='prediction_xgb')\n",
    "# Train the model\n",
    "vaex_xgb_model.fit(df_train)\n",
    "# Get the prediction of the model on the training data\n",
    "df_train = vaex_xgb_model.transform(df_train)\n",
    "\n",
    "# Preview the resulting train dataframe that contans the predictions\n",
    "df_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that in the above cell block, we call `.transform` on the `vaex_xgb_model` object. This adds the \"prediction_xgb\" column as _virtual column_ in the output dataframe. This can be quite convenient when calculating various metrics and making diagnosic plots. Of course, one can call a `.predict` on the `vaex_xgb_model` object, which returns an in-memory `numpy` array object housing the predictions.\n",
    "\n",
    "### Performance on training set\n",
    "\n",
    "Anyway, let's see what the performance is of the model on the training set. First let's create a convenience function that will help us get multiple metrics at once."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:40.985268Z",
     "start_time": "2020-05-01T17:12:40.975947Z"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.metrics import accuracy_score, f1_score, roc_auc_score\n",
    "def binary_metrics(y_true, y_pred):\n",
    "    acc = accuracy_score(y_true=y_true, y_pred=y_pred)\n",
    "    f1 = f1_score(y_true=y_true, y_pred=y_pred)\n",
    "    roc = roc_auc_score(y_true=y_true, y_score=y_pred)\n",
    "    print(f'Accuracy: {acc:.3f}')\n",
    "    print(f'f1 score: {f1:.3f}')\n",
    "    print(f'roc-auc: {roc:.3f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's check the performance of the model on the training set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:41.088203Z",
     "start_time": "2020-05-01T17:12:40.988951Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Metrics for the training set:\n",
      "Accuracy: 0.924\n",
      "f1 score: 0.896\n",
      "roc-auc: 0.914\n"
     ]
    }
   ],
   "source": [
    "print('Metrics for the training set:')\n",
    "binary_metrics(y_true=df_train.survived.values, y_pred=df_train.prediction_xgb.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Automatic pipelines\n",
    "\n",
    "Now, let's inspect the performance of the model on the test set. You probably noticed that, unlike when using other libraries, we did not bother to create a pipeline while doing all the cleaning, inputing, feature engineering and categorial encoding. Well, we did not _explicitly_ create a pipeline. In fact `veax` keeps track of all the changes one applies to a DataFrame in something called a state. A state is the place which contains all the informations regarding, for instance, the virtual columns we've created, which includes the newly engineered features, the categorically encoded columns, and even the model prediction! So all we need to do, is to extract the state from the training DataFrame, and apply it to the test DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:41.299459Z",
     "start_time": "2020-05-01T17:12:41.093866Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  pclass</th><th>survived  </th><th>name                                        </th><th>sex   </th><th style=\"text-align: right;\">   age</th><th style=\"text-align: right;\">  sibsp</th><th style=\"text-align: right;\">  parch</th><th>ticket          </th><th style=\"text-align: right;\">   fare</th><th>cabin  </th><th>embarked  </th><th>boat  </th><th style=\"text-align: right;\">  body</th><th>home_dest               </th><th>name_title  </th><th style=\"text-align: right;\">  name_num_words</th><th>deck  </th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th>prediction_xgb  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>O'Connor, Mr. Patrick                       </td><td>male  </td><td style=\"text-align: right;\">28.032</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>366713          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                    </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           84.096</td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Canavan, Mr. Patrick                        </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>364858          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>Ireland Philadelphia, PA</td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">       1</td><td>False     </td><td>Ovies y Rodriguez, Mr. Servando             </td><td>male  </td><td style=\"text-align: right;\">28.5  </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>PC 17562        </td><td style=\"text-align: right;\">27.7208</td><td>D43    </td><td>C         </td><td>None  </td><td style=\"text-align: right;\">   189</td><td>?Havana, Cuba           </td><td>Mr          </td><td style=\"text-align: right;\">               5</td><td>D     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           28.5  </td><td style=\"text-align: right;\">                 27.7208</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       2</td><td style=\"text-align: right;\">                   4</td><td style=\"text-align: right;\">                      0.578797</td><td>True            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Windelov, Mr. Einar                         </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>SOTON/OQ 3101317</td><td style=\"text-align: right;\"> 7.25  </td><td>M      </td><td>S         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                    </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.25  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">       2</td><td>True      </td><td>Shelley, Mrs. William (Imanita Parrish Hall)</td><td>female</td><td style=\"text-align: right;\">25    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      1</td><td>230433          </td><td style=\"text-align: right;\">26     </td><td>M      </td><td>S         </td><td>12    </td><td style=\"text-align: right;\">   nan</td><td>Deer Lodge, MT          </td><td>Mrs         </td><td style=\"text-align: right;\">               6</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            2</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           50    </td><td style=\"text-align: right;\">                 13     </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td>True            </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    pclass  survived    name                                          sex        age    sibsp    parch  ticket               fare  cabin    embarked    boat      body  home_dest                 name_title      name_num_words  deck      multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title  prediction_xgb\n",
       "  0         3  False       O'Connor, Mr. Patrick                         male    28.032        0        0  366713             7.75    M        Q           None       nan  None                      Mr                           3  M                   0            1              1           0             84.096                    7.75                      0                         1                     0                        0.578797  False\n",
       "  1         3  False       Canavan, Mr. Patrick                          male    21            0        0  364858             7.75    M        Q           None       nan  Ireland Philadelphia, PA  Mr                           3  M                   0            1              1           0             63                        7.75                      0                         1                     0                        0.578797  False\n",
       "  2         1  False       Ovies y Rodriguez, Mr. Servando               male    28.5          0        0  PC 17562          27.7208  D43      C           None       189  ?Havana, Cuba             Mr                           5  D                   0            1              1           0             28.5                     27.7208                    0                         2                     4                        0.578797  True\n",
       "  3         3  False       Windelov, Mr. Einar                           male    21            0        0  SOTON/OQ 3101317   7.25    M        S           None       nan  None                      Mr                           3  M                   0            1              1           0             63                        7.25                      0                         0                     0                        0.578797  False\n",
       "  4         2  True        Shelley, Mrs. William (Imanita Parrish Hall)  female  25            0        1  230433            26       M        S           12         nan  Deer Lodge, MT            Mrs                          6  M                   0            1              2           0             50                       13                         1                         0                     0                        0.145177  True"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# state transfer to the test set\n",
    "state = df_train.state_get()\n",
    "df_test.state_set(state)\n",
    "\n",
    "# Preview of the \"transformed\" test set\n",
    "df_test.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that once we apply the state from the train to the test set, the test DataFrame contains all the features we created or modified in the training data, and even the predictions of the xgboost model!\n",
    "\n",
    "The state is a simple Python dictionary, which can be easily stored as JSON to disk, which makes it very easy to deploy.\n",
    "\n",
    "### Performance on test set\n",
    "\n",
    "Now it is trivial to check the model performance on the test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:41.381884Z",
     "start_time": "2020-05-01T17:12:41.310025Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Metrics for the test set:\n",
      "Accuracy: 0.798\n",
      "f1 score: 0.744\n",
      "roc-auc: 0.785\n"
     ]
    }
   ],
   "source": [
    "print('Metrics for the test set:')\n",
    "binary_metrics(y_true=df_test.survived.values, y_pred=df_test.prediction_xgb.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Feature importance\n",
    "Let's now look at the feature importance of the `xgboost` model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:41.911379Z",
     "start_time": "2020-05-01T17:12:41.384369Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAh8AAAIbCAYAAABLzPzHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nOzdeZgdZZ3+//dN2AkGMCgmhsSRRQfxC9iAOKPAoCJGBBdQENmUzXEAf6yDjiIKwggCGkcWQQWU1UFQYFgGgyiLdGQzosMWCGEnZCMsWe7fH/W0Ho7dfU53uqvT5H5dV1+eU/Usn6qTmbr7qTqNbBMRERFRl+WGuoCIiIhYtiR8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMi4jVC0rqS5kkaMdS1RPQm4SMiYikj6dOSbpf0gqSny+svSFJv/Ww/anuk7UV11RrRHwkfERFLEUmHAacD3wbWAd4IHAj8E7DiEJYWMWCUv3AaEbF0kDQKeBzY0/bPe2gzEfgm8FZgNnCO7WPLvgnAw8AKthdKmgzcDPwL8E7gVmB3288O6oFEtJCVj4iIpcdWwErAFb20eQHYE1gDmAgcJGnnXtrvDuwDvIFq5eTwgSk1ov8SPiIilh6jgWdtL+zaIOkWSbMkvSjpfbYn277X9mLb9wAXAlv3MuaPbP+f7ReBS4BNBvcQIlpL+IiIWHo8B4yWtHzXBtvvsb1G2becpC0l/VrSM5JmUz0PMrqXMZ9seD0fGDkYhUf0RcJHRMTS41bgZWCnXtr8DLgSGGd7FHAG0Ou3YCKWNgkfERFLCduzgK8D/yXpk5JGSlpO0ibAaqXZ6sBM2y9J2oLqmY6IYWX51k0iIqIutv9T0gzgSOA8qgdMHwKOAm4BvgCcImkScBPVcxxrDFG5Ef2Sr9pGRERErXLbJSIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJW+aptRBtGjx7tCRMmDHUZERHDypQpU561vXbz9oSPiDZMmDCBzs7OoS4jImJYkfRId9tz2yUiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK2WH+oCXkskTQM+b/uGFu0MrG/7gX7M0e++dZL0Y+Ax21+ps+9guXfGbCYcfdVQlxERUatpJ04clHGz8hERERG1SviIiIiIWiV8DAJJW0i6VdIsSU9ImiRpxaZmH5b0kKRnJX1b0nIN/feVdJ+k5yVdK2l8H+dfSdLJkh6V9JSkMyStUvZtI+kxSYdJerrUt09D31UknSLpEUmzJf22oe9HJU0txzVZ0tsb+m0q6Q+S5kq6GFi5qaaPSLqr9L1F0jvb7dvDMY6W9Ksy3kxJN3edQ0ljJP1c0jOSHpZ0cNm+Vjn2Hcv7kZIekLRnD3PsL6lTUuei+bPb/wAiIqJXCR+DYxHwJWA0sBWwHfCFpjYfAzqAzYCdgH0BJO0MHAN8HFgbuBm4sI/znwRsAGwCrAeMBb7asH8dYFTZ/jng+5LWLPtOBt4FvAdYCzgSWCxpg1LHoaWuq4FfSlqxBKtfAOeXPpcCn+iaTNJmwLnAAcDrgTOBK0tI6rVvLw4DHiu1vJHqnLkEkF8Cd5fj2w44VNL2tmdSneezJb0BOBW4y/Z53U1g+yzbHbY7Rqw6qo2SIiKiHQkfg8D2FNu32V5oexrVxXbrpmYn2Z5p+1HgNGC3sv0A4Fu277O9EDgB2KTd1Q9JAvYDvlTGn1vG+HRDswXAcbYX2L4amAdsWC7c+wKH2J5he5HtW2y/DHwKuMr29bYXUIWUVahCyruBFYDTypiXAXc0zLcfcKbt28uYPwFeLv1a9e3JAuBNwPjS72bbBjYH1rZ9nO1XbD8EnN11/Lavowo4/wtMLOc7IiJqlPAxCCRtUG4JPClpDtXFf3RTs+kNrx8BxpTX44HTy+2EWcBMQFS/xbdjbWBVYErDGP9Ttnd5rgSbLvOBkaXGlYEHuxl3TKkTANuLyzGMLftmlIt/4zF1GQ8c1lVPqWlc6deqb0++DTwAXFduXx3dMNeYprmOoVod6XIW8A7gR7afa2OuiIgYQPmq7eD4AXAnsJvtuZIOBT7Z1GYcMLW8Xhd4vLyeDhxv+6f9nPtZ4EVgI9sz+tH3JeCtVLctGj0ObNz1pqywjANmAAbGSlJDiFiXv4WYrmM6vnlCSVu36NutsqJzGFWo2Qj4taQ7ylwP216/u36SRlCtRJ0HHCTpR+18bXnjsaPoHKSvnEVELGuy8jE4VgfmAPMkvQ04qJs2R0haU9I44BDg4rL9DODfywUVSaMk7dLuxGVF4mzg1PJcA5LGStq+zb7nAt8pD22OkLSVpJWAS4CJkraTtALVhf9l4BbgVmAhcLCk5SV9HNiiYeizgQMlbanKapImSlq9jb7dKg+wrldC0Byq52wWAb8H5kg6qjw8O0LSOyRtXroeU/53X6pbR+eVQBIRETVJ+BgchwO7A3OpLrwXd9PmCmAKcBdwFXAOgO3LqR4YvajcsvkjsEMf5z+K6pbEbWWMG4AN+1D7vVTPXcwstSxn+y/AHsD3qFZIdgR2LM9VvEL1gOzewPNUz4f8d9eAtjupnvuYVPY/UNrSqm8v1i/HNY8qwPyX7cm2F5XaNgEeLrX+EBgl6V3A/wfsWdqdRLVqc3Q340dExCDRq2+1R0R3Ojo63NnZOdRlREQMK5Km2O5o3p6Vj4iIiKhVwscwVf7Y17xufj4z1LUNFEnH9HCM1wx1bRER0X/5tsswZXujoa5hsNk+gepryhER8RqSlY+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErZZv1UDShsBFwHrAl21/d9Crir8j6cfAY7a/UmffpZmk9wI/tL1hD/snAA8DK9heuCRz3TtjNhOOvmpJhoiI6NW0EycOdQm1aWfl40hgsu3VEzxiKEmypPW63tu+uTF4SJom6f1DU11ERLSrnfAxHpja3Q5JIwa2nIiIiHit6zV8SLoR2BaYJGmepJ9J+oGkqyW9AGwraYykn0t6RtLDkg5u6L+KpB9Lel7SnyQdIemxhv2v+k22tP1mw/uPSLpL0ixJt0h6Z8O+aZIOl3SPpNmSLpa0csP+nUrfOZIelPQhSbtImtJ0jIdJ+kWL87CSpJMlPSrpKUlnSFql7NtG0mNlnKclPSFpn6ZzcIqkR0qdv23o+1FJU8vxTZb09oZ+m0r6g6S5ki4GVm6qqbdz02vfHo6x1XFMlHRnOZ/TJR3bsG9C+Sz3Kfuel3SgpM3L5zNL0qSm+faVdF9pe62k8S3q+015eXf5t/iprprL/vOBdYFflv1HdjPGKEnnlGObIembCdAREfXrNXzY/hfgZuCLtkcCrwC7A8cDqwO3AL8E7gbGAtsBh0ravgzxNeCt5Wd7YK92C5O0GXAucADweuBM4EpJKzU02xX4EPAW4J3A3qXvFsB5wBHAGsD7gGnAlcBbGi/ywB7A+S3KOQnYANiE6tmXscBXG/avA4wq2z8HfF/SmmXfycC7gPcAa1HdxlosaQPgQuBQYG3gaqoL54qSVgR+UepaC7gU+EQ756ZV3xZ6O44XgD2pzudE4CBJOzf13xJYH/gUcBrwZeD9wEbArpK2LvXvDBwDfLwc+83lXPTI9vvKy/9ne6Tti5v2fxZ4FNix7P/Pbob5CbCQ6jPcFPgg8Pme5pS0v6ROSZ2L5s/urbyIiOiD/nzb5Qrbv7O9GNgYWNv2cbZfsf0QcDbw6dJ2V+B42zNtTwf68szIfsCZtm+3vcj2T4CXgXc3tPmu7cdtz6QKQZuU7Z8DzrV9ve3FtmfY/rPtl4GLqQIHkjYCJgC/6qkISSq1fKkcx1zghIZjBFgAHGd7ge2rgXnAhpKWA/YFDik1LLJ9S6njU8BVpcYFVCFlFaqQ8m5gBeC0MuZlwB1tnptWfXvT7XEA2J5s+95yPu+hCgtbN/X/hu2XbF9HFVYutP207RlUAWPT0u4A4Fu27ysPgp4AbNJq9WNJSHojsANwqO0XbD8NnMqrP8dXsX2W7Q7bHSNWHTVYpUVELHNaftulG9MbXo8Hxkia1bBtBNWFBmBMU/tH+jDPeGAvSf/WsG3FMmaXJxtez2/YN45qJaE7PwEulPQV4LPAJSUM9GRtYFVgSpVDABDVcXZ5runbFPOBkcBoqlseD3Yz7hgazoftxZKmU606LAJm2HZD+8Zz19u5cYu+venpOJC0JXAi8I4y10pUqyqNnmp4/WI370c21H+6pFMa9ovq2Pvyb6QvxlOFsicaPsflePW/z4iIqEF/wkfjRW068LDt9Xto+wRVEOh6YHXdpv3zqS7sXdYBup4JmU61anJ8P2qcTnWr5+/Yvk3SK8B7qW4h7d5irGepLpwbld/g++JZ4KVSy91N+x6nWjkC/rrCMg6YQXWOx0pSQ4hYl7+FmB7PTbm10Vvf/voZMAnYwfZLkk6jClf90VX/T5ewpmbuZd90qtWh0Uv6tduIiFgy/QkfjX4PzJF0FNUtlVeAtwOr2L4DuAT4d0m3A6sB/9bU/y5gd0lTgQ9QLeN3ln1nA5dLuqHMsyqwDfCbcuujN+cA10n6FfBr4E3A6rb/XPafR3UhXWj7t70NVFYkzgZOlfRF209LGgu8w/a1bfQ9F/iOpM9SrQRsAfyB6twcLWk74DfAIVQXx1tK94XAwZK+D3y09Pt1q3MD3Nqib3+tDswswWMLqtB2XT/HOgP4hqS7bE+VNAr4oO3mlZRmTwH/ADzQYv/fsf2EpOuAUyT9B9UtpbcAb7Z9U6uCNx47is5l6Dv4ERGDaYn+wqntRcCOVM9aPEz1m/4PqR5aBPg61TL6w1QXquYHOw8p/WcBn6F6ULJr7E6qZxsmAc9TXXD2brOu3wP7UN3Tnw3cRLXs3uV8qtsHrR407XJUmf82SXOAGyjPQrThcOBequcuZlI9vLqc7b9QPXvyParztiPVw5Kv2H6F6mHMvamO/VPAfzccX4/nplXfJfAF4DhJc6ketr2kvwPZvpzqPFxUzucfqZ7HaOVY4Cfl2zO7drP/W8BXyv7Du9m/J9Utoz9RnZvLqIJpRETUSK9+NGCQJ5O2AS6w/ebaJu2+jlWAp4HNbN8/lLXE8NDR0eHOzs7WDSMi4q8kTbHd0bx9Wf1vuxwE3JHgERERUb9lLnxImkZ1u+ewpu1Tyx+nav75zJAUOggkHdPDMV4z1LVB9d9q6aG+eUNdW0REDJxab7tEDFe57RIR0Xe57RIRERFLhYSPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8BJI2lHSnpLmSDh7AcT8j6bqG95a03kCN3zDuupLmSRox0GNHRMTAk+2hriGGmKRzgDm2vzTI8xhY3/YDgznPYFjpTev7TXudNtRlDFvTTpw41CVExBCQNMV2R/P2rHwEwHhg6lAXERERy4aEj2WcpBuBbYFJ5dbFIeUWzBxJ0yUd29B2Qrl1sk/Z97ykAyVtLukeSbMkTWpov7ek33Yz5+aSnpK0fMO2T0i6q0WtW0jqLLU9Jek7TXUtL2mrchxdPy9JmlbaLSfpaEkPSnpO0iWS1lrScxgREX2T8LGMs/0vwM3AF22PBO4G9gTWACYCB0nauanblsD6wKeA04AvA+8HNgJ2lbR1iznvAJ4DPtCweQ/g/Bblng6cbvt1wFuBS7oZ+1bbI8uxrAncBlxYdh8M7AxsDYwBnge+39NkkvYvYadz0fzZLUqLiIh2JXzEq9iebPte24tt30N14W4OE9+w/ZLt64AXgAttP217BlWQ2bSNqX5CFTgoqw/bAz9r0WcBsJ6k0bbn2b6tRfvvlvq+XN4fAHzZ9mO2XwaOBT7ZuALTyPZZtjtsd4xYdVQbhxQREe1I+IhXkbSlpF9LekbSbOBAYHRTs6caXr/YzfuRbUx1AbCjpJHArsDNtp9o0edzwAbAnyXdIekjvRzHAcA2wO62F5fN44HLy+2hWcB9wCLgjW3UGxERAyThI5r9DLgSGGd7FHAGoIGepKyS3Ap8DPgsrW+5YPt+27sBbwBOAi6TtFpzO0nvBb4B7GS78X7JdGAH22s0/KxcaomIiJp0u9wcy7TVgZm2X5K0BbA7cF2LPv11HnA0ZUWiVWNJewDX2n6mrFxAtXLR2GYccDGwp+3/axriDOB4SXvZfkTS2sB7bF/Rau6Nx46iM18XjYgYEFn5iGZfAI6TNBf4Kt081DmALqcED9svtNH+Q8BUSfOoHj79tO2XmtpsB6xDtSrS9Y2Xrq8Rn061qnNdOb7bqB6ejYiIGuWPjMWQkvQgcIDtG4a6lt50dHS4s7NzqMuIiBhW8kfGYqkj6ROAgRuHupaIiKhPwkcMCUmTgR8A/9rwbRQkXdP0R8K6fo4ZsmIjImJA5YHTGBK2t+lh+w41lxIRETXLykdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUavlh7qA4UTSNODztm9o0c7A+rYf6Mcc/e5bJ0k/Bh6z/ZW6+g7EuZE0GbjA9g/70u/eGbOZcPRV/Z32NWHaiROHuoSIeI3IykdERETUKuEjIiIiapXw0Q+StpB0q6RZkp6QNEnSik3NPizpIUnPSvq2pOUa+u8r6T5Jz0u6VtL4Ps6/kqSTJT0q6SlJZ0hapezbRtJjkg6T9HSpb5+GvqtIOkXSI5JmS/ptQ9+PSppajmuypLc39NtU0h8kzZV0MbByU00fkXRX6XuLpHe227eX4zyi1P+4pH3bPQdl/06lnjmSHpT0oW7Gf5OkeyQd3k49ERExMBI++mcR8CVgNLAVsB3whaY2HwM6gM2AnYB9ASTtDBwDfBxYG7gZuLCP858EbABsAqwHjAW+2rB/HWBU2f454PuS1iz7TgbeBbwHWAs4ElgsaYNSx6GlrquBX0pasQSrXwDnlz6XAp/omkzSZsC5wAHA64EzgStLQOi1b09KWDgc+ACwPvD+ds+BpC2A84AjgDWA9wHTmsafANwETLJ9cg817C+pU1LnovmzW5UcERFtSvjoB9tTbN9me6HtaVQX262bmp1ke6btR4HTgN3K9gOAb9m+z/ZC4ARgk3ZXPyQJ2A/4Uhl/bhnj0w3NFgDH2V5g+2pgHrBhWX3ZFzjE9gzbi2zfYvtl4FPAVbavt72AKqSsQhVS3g2sAJxWxrwMuKNhvv2AM23fXsb8CfBy6deqb092BX5k+4+2XwCO7cM5+BxwbjmWxeVY/9ww9j8Ck4Gv2T6rpwJsn2W7w3bHiFVHtVFyRES0I9926YeySvAdqpWNVanO45SmZtMbXj8CjCmvxwOnSzqlcUiq39wfaWP6tcucU6pr8F/7j2ho81wJNl3mAyOpVmpWBh7sZtwxjfPbXixpeqlrETDDtpuOqct4YC9J/9awbcUyplv07ckYXn1OG/u0OgfjqFZuevIZ4AHgsjbqiIiIAZbw0T8/AO4EdrM9V9KhwCeb2owDppbX6wKPl9fTgeNt/7Sfcz8LvAhsZHtGP/q+BLwVuLtp3+PAxl1vyurCOGAGVYAYK0kNIWJd/hZiuo7p+OYJJW3dom9Pnijzd1m36Th6OwfTyzH25FjgQ8DPJH3a9qIWtbDx2FF05qumEREDIrdd+md1YA4wT9LbgIO6aXOEpDUljQMOAS4u288A/l3SRgCSRknapd2JbS8GzgZOlfSGMsZYSdu32fdc4DuSxkgaIWkrSSsBlwATJW0naQXgMKpbJ7cAtwILgYMlLS/p48AWDUOfDRwoaUtVVpM0UdLqbfTtySXA3pL+UdKqwNf6cA7OAfYpx7Jc2fe2hrEXALsAqwHnq+Fh4IiIGHz5f7r9cziwOzCX6iJ4cTdtrqC6bXAXcBXVBRHbl1M9LHmRpDnAH4Ed+jj/UVS3DW4rY9wAbNiH2u+leu5iZqllOdt/AfYAvke1srAjsKPtV2y/QvWA7N7A81TPh/x314C2O6mewZhU9j9Q2tKqb09sX0P1rMyNZbwb2z0Htn8P7AOcCsymerD0Vc/UNNT1BuDcBJCIiPro1bfiI6I7HR0d7uzsHOoyIiKGFUlTbHc0b89vexEREVGrhI+lVPljX/O6+fnMUNc2UCQd08MxXjPUtUVExODJt12WUrY3GuoaBpvtE6j+PkdERCxDsvIRERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCRwAg6QxJ/zHUdfRG0jaSHhvqOiIiYsksP9QFRP0k7Q183vY/d22zfeDQVbT0u3fGbCYcfdVQl9G2aSdOHOoSIiJ6lJWPiIiIqFXCx1JC0tGSHpQ0V9KfJH2sbB8h6RRJz0p6WNIXJVnS8mX/KEnnSHpC0gxJ35Q0opd53g6cAWwlaZ6kWWX7jyV9s7zeRtJjko6U9HQZe2dJH5b0f5JmSjqmYczlGup/TtIlktYq+1aWdEHZPkvSHZLe2OJcrCXpR5Iel/S8pF/05ZyVfetJuknS7HLuLi7bJenUclyzJd0j6R3tfUoRETEQcttl6fEg8F7gSWAX4AJJ6wE7ATsAmwAvAJc29fsJ8BSwHrAa8CtgOnBmd5PYvk/SgTTddunGOsDKwFhgb+Bs4HrgXcC6wBRJF9l+CDgY2BnYGngG+C7wfWA3YC9gFDAOeLkcx4stzsX5wDxgo/K/7+mhXbfnzPYTwDeA64BtgRWBjtLng8D7gA2A2cDbgFndDS5pf2B/gBGvW7tFyRER0a6sfCwlbF9q+3Hbi21fDNwPbAHsCpxu+zHbzwMndvUpKwg7AIfafsH208CpwKcHoKQFwPG2FwAXAaNLHXNtTwWmAu8sbQ8AvlxqfBk4FvhkWZ1ZALweWM/2IttTbM/paVJJbyrHdKDt520vsH1Td217OWdd9Y8Hxth+yfZvG7avThU6ZPu+Ela6G/8s2x22O0asOqqNUxYREe1I+FhKSNpT0l3l1sQs4B1UF/wxVCsZXRpfjwdWAJ5o6Hcm8IYBKOk524vK666Viqca9r8IjGyo4/KGGu4DFgFvpFrFuBa4qNxG+U9JK/Qy7zhgZglaverlnAEcCQj4vaSpkvYFsH0jMIlqZeYpSWdJel2ruSIiYuAkfCwFJI2nuq3xReD1ttcA/kh18XwCeHND83ENr6dT3coYbXuN8vM62xu1mNIDV/1f69ihoYY1bK9se0ZZufi67X+kun3yEWDPFmOtJWmN3iZscc6w/aTt/WyPoVqZ+a9yGwvb37X9LqrbOhsARyzJwUdERN/kmY+lw2pUgeAZAEn7UP0WD3AJcIikq6ie+Tiqq5PtJyRdB5xS/kbHPOAtwJt7ulVRPAW8WdKKtl8ZgPrPAI6XtJftRyStDbzH9hWStgWeBf4EzKG67bGop4HKMV1DFRb+tRzTVrZ/09S0t3OGpF2AW20/Bjxf2i6StDlV6P4D1fl8qbd6umw8dhSd+fpqRMSAyMrHUsD2n4BTgFupgsHGwO/K7rOpHpy8B7gTuBpYyN8umHtSPVD5J6qL7GXAm1pMeSPVMxtPSnp2AA7hdOBK4DpJc4HbgC3LvnVKTXOobsfcBFzQYrzPUoWUPwNPA4c2N2hxzgA2B26XNK/Udojth4HXUZ3T54FHgOeAk/t2uBERsSRkD/QKfAwmSTsAZ9geP9S1LEs6Ojrc2dk51GVERAwrkqbY7mjenpWPpZykVcrf11he0ljga8DlQ11XREREfyV8LP0EfJ3qNsGdVLcuvtqyU/XfapnXzc8Zg1xvW3qobZ6k9w51bRERMbjywOlSzvZ8qucX+trvQGCp/e+12B7ZulVERLwWZeUjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUathGz4kbSjpTklzJR081PUMJEn/JOl+SfMk7TzAYx8j6Yfl9QRJlrT8QM5RF0mTJX1+qOuIiIi+GZYXneJIYLLtTYe6kEFwHDDJ9ukDPbDtEwZ6zGXBvTNmM+Hoq4a6jLZNO3HiUJcQEdGjYbvyAYwHpva109LwW34bNfTr2KJ/VBnO/7cQETGsDMv/hyvpRmBbYFK5NXFIuQUzR9J0Scc2tO26tfA5SY8CN5bt+0q6T9Lzkq6VNL6NeS3pYEkPSXpW0rcbL1q9jVn6/quk+4H7e5njQeAfgF+WY1tJ0j5l3Lll7gMa2m8j6TFJR0p6WtITknaW9GFJ/ydppqRjGtofK31wLe4AACAASURBVOmCbubdRdKUpm2HSfpFi3PyY0n/JemaUu/vJK0j6bRyHv4sadOG9mMk/VzSM5IebrxlVmq7VNIF5VjvlbSBpH8vxzZd0gebSnirpN9Lmi3pCklrNYz3bkm3SJol6W5J2zTsmyzpeEm/A+aXcx4RETUYluHD9r8ANwNftD0SuBvYE1gDmAgc1M2zElsDbwe2L/uOAT4OrF3GurDN6T8GdACbATsB+wK0OebOwJbAP/ZybG8FHgV2tD3S9svA08BHgNcB+wCnStqsods6wMrAWOCrwNnAHsC7gPcCX5XU6uJ6JfAWSW9v2LYHcH6LfgC7Al8BRgMvA7cCfyjvLwO+A1CC2i+pPq+xwHbAoZK2bxhrxzLnmsCdwLVU/07HUt2OOrNp7j2pPoMxwELgu2WuscBVwDeBtYDDgZ9LWruh72eB/YHVgUfaOM6IiBgAwzJ8NLM92fa9thfbvofqor91U7Njbb9g+0XgAOBbtu+zvRA4AdikndUP4CTbM20/CpwG7Fa2tzPmt0rfF/t4fFfZftCVm4DrqEJFlwXA8bYXABdRXfRPtz3X9lSqWzjvbDHHy8DFVIEDSRsBE4BftVHi5ban2H4JuBx4yfZ5theVMbtWPjYH1rZ9nO1XbD9EFZQ+3TDWzbavLefwUqogd2LDsU2QtEZD+/Nt/9H2C8B/ALtKGlGO42rbV5d/F9cDncCHG/r+2PZU2wvL+K8iaX9JnZI6F82f3cZpiIiIdrwmwoekLSX9uizlzwYOpLoAN5re8Ho8cHpZjp8FzARE9dt1K43jPEL1G3e7Yzb2bZukHSTdVm6hzKK6gDYe33PlQg/QFWyeatj/IjCyjal+AuwuSVSrApeUUNJK81w9zT0eGNN1jsqxHAO8sZexnu3m2BqPpfnzWIHq3IwHdmma65+BN/XQ9+/YPst2h+2OEauO6q1pRET0wZA/fDlAfgZMAnaw/ZKk0/j78OGG19OpVgp+2o+5xvG3h0HXBR7vw5juZV+3JK0E/Jzq9sIVtheU5zDU17FasX2bpFeoVlV2Lz8DaTrwsO31B3DMcQ2v16VaBXq2zHW+7f166dvnzyMiIpbcayV8rA7MLMFjC6qL5nW9tD8D+Iaku2xPlTQK+KDtS9uY6whJt1P99n0I5XmGJRyzNysCKwHPAAsl7QB8EPjjEo7bk/OogtxC278d4LF/D8yRdBTVsxmvUD2Hs4rtO/o55h6SzgOmUT0TcpntReWh2jvK8yQ3UK2IvBt4wPZjfZ1k47Gj6MzXVyMiBsRr4rYL8AXgOElzqR64vKS3xrYvB04CLpI0h+pCvkObc10BTAHuonqg8ZwBGLO3WucCB1Md0/NUwerKJR23F+cD76C9B037pNw+2RHYBHiYaoXih8CS3NM4H/gx8CTVQ7cHl7mmUz0QfAxVcJsOHMFr5998RMSwJTsrz+2SZGB92w8MdS2DRdIqVN+u2cx2j18JXtZ0dHS4s7NzqMuIiBhWJE2x3dG8Pb8FRrODgDsSPCIiYrC8Vp75GBCS3gtc092+8vdEhtU8fSVpGtWDrDs3bZ9K9e2RZgf086HdiIhYhiV8NLB9M718JdX2gHzDpNU8Q8X2hB62b1RzKRER8RqW2y4RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWyw91Ae2SNA34vO0bWrQzsL7tB/oxR7/71knSj4HHbH+lzr79IWkC8DCwgu2FAzTmgHxOkiYDF9j+Yau2986YzYSjr1qS6Wox7cSJQ11CRERLWfmIiIiIWiV8xLAhadis1EVERM+GXfiQtIWkWyXNkvSEpEmSVmxq9mFJD0l6VtK3JS3X0H9fSfdJel7StZLG93H+lSSdLOlRSU9JOkPSKmXfNpIek3SYpKdLffs09F1F0imSHpE0W9JvG/p+VNLUclyTJb29od+mkv4gaa6ki4GVm2r6iKS7St9bJL2z3b69HGdvY06TdISkeyS9IOkcSW+UdE2Z5wZJazYNua+kx8s5OaxhrF4/T0mW9K+S7gfu76bOf5Y0XdK25X2Pn6+kD0j6czn3kwC1OAf7S+qU1Llo/ux2TltERLRh2IUPYBHwJWA0sBWwHfCFpjYfAzqAzYCdgH0BJO0MHAN8HFgbuBm4sI/znwRsAGwCrAeMBb7asH8dYFTZ/jng+w0X4pOBdwHvAdYCjgQWS9qg1HFoqetq4JeSViwX4l8A55c+lwKf6JpM0mbAucABwOuBM4ErS0jqtW9PehuzodkngA+Uc7EjcA3VuR1N9e/q4KZhtwXWBz4IHC3p/WV7O5/nzsCWwD821bk91Xn7hO1f9/b5ShoN/Bz4SpnrQeCfejsPts+y3WG7Y8Sqo3prGhERfTDswoftKbZvs73Q9jSqC+PWTc1Osj3T9qPAacBuZfsBwLds31cefjwB2KTd1Q9JAvYDvlTGn1vG+HRDswXAcbYX2L4amAdsWFZf9gUOsT3D9iLbt9h+GfgUcJXt620voAopq1CFlHcDKwCnlTEvA+5omG8/4Ezbt5cxfwK8XPq16tuT3sbs8j3bT9meQXWRv932neV4Lgc2bRrz67ZfsH0v8CPKZ9Lm5/mtcr5fbNi2C3AW8GHbvy/bevt8Pwz8yfZl5RyfBjzZxrmIiIgBNuzuoZdVgu9QrWysSnUMU5qaTW94/QgwprweD5wu6ZTGIalWKR5pY/q1y5xTqhzy1/4jGto81/StjvnASKrftlem+o272ZjG+W0vljS91LUImGHbTcfUZTywl6R/a9i2YhnTLfr2pLcxuzzV8PrFbt6PbBqz+TPZGPr1eXY5FDivhJnGunv6fMc0jmPb5RxHRETNhl34AH4A3AnsZnuupEOBTza1GQdMLa/XBR4vr6cDx9v+aT/nfpbqwrpR+Y2/r31fAt4K3N2073HKxRj+usIyDphBFSDGSlJDiFiXv4WYrmM6vnlCSVu36NuTHsdcAuOAPzfU0PWZtPN5mr+3C3COpBm2T2uq++8+X0nrlxq63qvxfSsbjx1FZ77GGhExIIbdbRdgdWAOME/S24CDumlzhKQ1JY0DDgEuLtvPAP5d0kYAkkZJ2qXdiW0vBs4GTpX0hjLG2PLsQTt9zwW+I2mMpBGStirPUVwCTJS0naQVgMOobnPcAtwKLAQOlrS8pI8DWzQMfTZwoKQtVVlN0kRJq7fRtye9jdlf/yFp1XLu9+Fvn0k7n2d3Hqd6PuRgSV3PiPT2+V4FbCTp46q+NXMw1fM5ERFRs+EYPg4HdgfmUl0kL+6mzRVUS/d3UV10zgGwfTnVA6MXSZoD/BHYoY/zHwU8ANxWxrgB2LAPtd9L9dzFzFLLcrb/AuwBfI9qhWRHYEfbr9h+heoByr2B56meD/nvrgFtd1I9ozGp7H+gtKVV3570NuYSuKmM87/AybavK9vb+Tx7qvNRqgBylKTP9/b52n6WarXkROA5qodff7eExxQREf2gVz8OEBHd6ejocGdn51CXERExrEiaYrujeftwXPmIiIiIYSzhoxuq/tjXvG5+PjPUtQ0UScf0cIzXDHVtERHx2jYcv+0y6GxvNNQ1DDbbJ1D9HYyIiIhaZeUjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUavlh7qAGN4kHQusZ3uPHvZ/BtjL9gcHaX4D69t+YDDnvnfGbCYcfdWSDjPopp04cahLiIhoKSsfMWAkTZBkSX8NtbZ/OljBo5WhnDsiInqW8BERERG1SvhYRkmaJukISfdIekHSOZLeKOkaSXMl3SBpTUnbSHqsm77v72bY35T/nSVpnqStJO0t6bdt1LORpOslzZT0lKRjyvYtJN0qaZakJyRNkrRiU/cPS3pI0rOSvi1pudL3VXOXVZkDJd0v6XlJ35ekPp24iIhYYgkfy7ZPAB8ANgB2BK4BjgFGU/3bOLiP472v/O8atkfavrWdTpJWB24A/gcYA6wH/G/ZvQj4UqlpK2A74AtNQ3wM6AA2A3YC9u1luo8AmwP/D9gV2L6XuvaX1Cmpc9H82e0cSkREtCHhY9n2PdtP2Z4B3AzcbvtO2y8DlwOb1lTHR4AnbZ9i+yXbc23fDmB7iu3bbC+0PQ04E9i6qf9JtmfafhQ4Dditl7lOtD2rtP01sElPDW2fZbvDdseIVUctyfFFRESDfNtl2fZUw+sXu3k/sqY6xgEPdrdD0gbAd6hWNlal+jc7panZ9IbXj1CtnvTkyYbX86nvGCMiokj4iFZeoLroAyBpBLB2D23dzzmm0/NqxQ+AO4HdbM+VdCjwyaY244Cp5fW6wOP9rKNHG48dRWe+xhoRMSBy2yVa+T9gZUkTJa0AfAVYqYe2zwCLgX/o4xy/AtaRdKiklSStLmnLsm91YA4wT9LbgIO66X9EeTh2HHAIcHEf54+IiBolfESvbM+mesDzh8AMqpWQx3poOx84Hvhd+XbKu9ucYy7Vg687Ut0WuR/Ytuw+HNgdmAucTffB4gqqWzF3AVcB57Qzb0REDA3Z/V0pj1h2dHR0uLOzc6jLiIgYViRNsd3RvD0rHxEREVGrPHAatZD0Xqq/I/J3bOcbJxERy5CEj6iF7ZvJ11ojIoLcdomIiIiaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUavmhLiBiOLh3xmwmHH3VUJfRo2knThzqEiIi2paVj3jNk3SspAuGuo6IiKgkfEREREStEj7iNUVSbiVGRCzlEj4GkaRpkg6XdI+k2ZIulrSypDUl/UrSM5KeL6/f3NBvsqRvSrpF0jxJv5T0ekk/lTRH0h2SJjS0f5uk6yXNlPQXSbu2UduPJX1f0lWS5kq6XdJby74Jktx4IS81fb683lvS7ySdKmmWpIckvadsny7paUl7tZj/LaXvcuX9DyU93bD/AkmHltdjJF1Zju8BSfs1tDtW0mWl/Rxg7zL2TeW4rgdGN7RfubR9rsx/h6Q39lDj/pI6JXUumj+71SmNiIg2JXwMvl2BDwFvAd4J7E113n8EjAfWBV4EJjX1+zTwWWAs8Fbg1tJnLeA+4GsAklYDrgd+BrwB2A34L0kbtVHbbsDXgTWBB4Dj+3BcWwL3AK8vc18EbA6sB+wBTJI0sqfOth8G5gCblk3vBeZJent5/z7gpvL6QuAxYAzwSeAESds1DLcTcBmwBvDTUs8UqtDxDaAxCO0FjALGldoPpDr/3dV4lu0O2x0jVh3V27mIiIg+SPgYfN+1/bjtmcAvgU1sP2f757bn255LddHfuqnfj2w/aHs2cA3woO0bbC8ELuVvF+2PANNs/8j2Qtt/AH5OdZFu5b9t/76M+VNgkz4c18NlzkXAxVQX8+Nsv2z7OuAVqiDSm5uArSWtU95fVt6/BXgdcLekccA/A0fZfsn2XcAPqYJZl1tt/8L2YmBtqhD0H6WW31Cd9y4LqELHerYX2Z5ie04fjjsiIpZQ7o8PvicbXs8HxkhaFTiVakVkzbJvdUkjysUc4KmGfi92875rVWE8sKWkWQ37lwfO70dtPa5UdKO5Hmz3VGNPbgI+SrWq8RtgMlWoeAm42fZiSWOAmSWkdXkE6Gh4P73h9RjgedsvNLUfV16fX15fJGkN4ALgy7YXtKg1IiIGSMLH0DgM2BDY0vaTkjYB7gTUj7GmAzfZ/sAA1td14V6V6tYIwDo9tF0SNwHfpgofNwG/Bc6gCh9dt1weB9aStHpDAFkXmNEwjhtePwGsKWm1hgCyblebEjK+Dny9PDdzNfAX4JzeCt147Cg687c0IiIGRG67DI3VqVYGZklai/L8Rj/9CthA0mclrVB+Nm94dqLPbD9DdXHfQ9IISftSPXcyoGzfT3Ue9gB+U25/PAV8ghI+bE8HbgG+VR4WfSfwOarbRN2N+QjQSRUuVpT0z8COXfslbStpY0kjqILVAmBRd2NFRMTgSPgYGqcBqwDPArcB/9PfgcpqwAepHlB9nOpWyknASktY437AEcBzwEZUAWAw3AQ8Z/vRhveiWgnqshswger4Lge+Zvv6XsbcneqB2JlUwe68hn3rUD1bMofqwd2bqG69RERETWS7dauIZVxHR4c7OzuHuoyIiGFF0hTbHc3bs/IRERERtUr4eA2TNLX8kbLmn88sSzVERMTSJd92eQ2z3c4fGnvN1xAREUuXrHxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUKuEjhhVJG0q6U9JcSQcPdT0REdF3yw91ARF9dCQw2famdU5674zZTDj6qjqnBGDaiRNrnzMiYrBl5SOGm/HA1L52kpSgHRGxlEj4iGFD0o3AtsAkSfMkHVJuwcyRNF3SsQ1tJ0iypM9JehS4sWzfV9J9kp6XdK2k8UNzNBERy66Ejxg2bP8LcDPwRdsjgbuBPYE1gInAQZJ2buq2NfB2YPuy7xjg48DaZawLayo/IiKKhI8YtmxPtn2v7cW276EKEls3NTvW9gu2XwQOAL5l+z7bC4ETgE16Wv2QtL+kTkmdi+bPHtRjiYhYliR8xLAlaUtJv5b0jKTZwIHA6KZm0xtejwdOlzRL0ixgJiBgbHfj2z7LdoftjhGrjhqMQ4iIWCYlfMRw9jPgSmCc7VHAGVRhopEbXk8HDrC9RsPPKrZvqaneiIggX7WN4W11YKbtlyRtAewOXNdL+zOAb0i6y/ZUSaOAD9q+tNVEG48dRWe+9hoRMSCy8hHD2ReA4yTNBb4KXNJbY9uXAycBF0maA/wR2GHQq4yIiFeR7datIpZxHR0d7uzsHOoyIiKGFUlTbHc0b8/KR0RERNQq4SP+//buP8iusr7j+PsjsQgJCQKR4YcEC0grUDLtKtNarSM6ELX+aIulpINSlTodp7XWGbEzVp0RTa1YsXWKULFYi/xQcYqCONOKVkYdFrUqICKayI8EQyAhCZUKfPvHPTu9LLub3eTuc/cm79fMmbl7znOe83zvk10+POfcXUmSmjJ8SJKkpgwfkiSpKcOHJElqyvAhSZKaMnxIkqSmDB+SJKkpw4ckSWrK8CFJkpoyfEiSpKYMH5IkqSnDhyRJasrwIUmSmjJ8SJKkpgwfkiSpKcOHJElqyvAhSZKaMnxIkqSmDB+SJKkpw4ckSWrK8CFJkpoyfEiSpKYMH5IkqSnDhyRJasrwIUmSmlo07AFIo+B7d2/hyHO+0Ox6a9e8tNm1JKk1Vz4kSVJThg9JktSU4UMjKck5Se5IsjXJLUle1e3fK8l5Se5L8pMkb0pSSRZ1x5cl+ViS9UnuTvKeJHsNtxpJ2rP4zIdG1R3A84ANwGnAJ5McDbwCWAWsBLYDV0467xLgXuBoYDHweeBO4KOTL5DkbOBsgL2WLp+XIiRpT+TKh0ZSVV1ZVfdU1WNVdTlwO/Ac4NXA+VV1V1U9AKyZOCfJwfSCyZurantV/Qz4e+D0aa5xYVWNVdXYXvsum/eaJGlP4cqHRlKSM4G3AEd2u5YABwGH0lvJmND/egXwZGB9kol9T5rURpI0zwwfGjlJVgAXAScDX6+qR5N8BwiwHji8r/nT+17fCTwMHFRVj7QaryTp8QwfGkWLgQI2AiQ5Czi+O3YF8BdJvkDvmY+3TZxUVeuTfAk4L8k7gG3AM4DDq+orM13whMOWMe7v3pCkgfCZD42cqroFOA/4Or2HR08AbugOXwR8Cfgu8G3gGuAR4NHu+JnALwG3AA8AnwYOaTV2SRKkqoY9BmneJFkFXFBVK3aln7GxsRofHx/QqCRpz5Dkpqoam7zflQ/tVpLsk+QlSRYlOQx4J3DVsMclSfp/hg/tbgK8m94tlW8DtwJ/M9QRSZIexwdOtVupqoeAZw97HJKk6bnyIUmSmjJ8SJKkpgwfkiSpKcOHJElqyvAhSZKaMnxIkqSmDB+SJKkpw4ckSWrK8CFJkpoyfEiSpKYMH5IkqSnDhyRJasrwIUmSmjJ8SJKkpgwfkiSpKcOHJElqyvAhSZKaMnxIkqSmDB+SJKkpw4ckSWrK8CFJkpoyfEiSpKYMH5IkqSnDhwYmyc1JXjAP/f5LkvcMul9J0nAsGvYAtPuoquOGPQZJ0sLnyockSWrK8KGBSbI2yYuSPCfJeJIHk9yb5IOzOPfKJBuSbEny1STTrqIkeUOSHyW5P8m/Jzm071gleWOS25M8kOQjSdJ3/E+S3Noduy7Jihmuc3ZXx/jGjRvn8lZIkmZg+NB8OB84v6qWAkcBV8zinGuBY4CnAd8C/m2qRkleCLwPeDVwCLAOuGxSs5cBzwZO7Nqd0p37SuCvgd8DlgP/BXxqugFV1YVVNVZVY8uXL59FCZKk2TB8aD78Ajg6yUFVta2qvrGjE6rq4qraWlUPA+8CTkyybIqmq4GLq+pbXdu3A7+Z5Mi+NmuqanNV/RT4MrCy2/+nwPuq6taqegR4L7ByptUPSdLgGT40H14HPBP4QZIbk7xspsZJ9kqyJskdSR4E1naHDpqi+aH0VjsAqKptwCbgsL42G/pePwQs6V6vAM5PsjnJZuB+IJPOlSTNMz/tooGrqtuBP0ryJHq3OD6d5MCq2j7NKWcArwBeRC94LAMeoBcMJruHXogAIMli4EDg7lkM7U7g3Kqa8paOJKkNVz40cEn+OMnyqnoM2NztfnSGU/YDHqa3grEvvdsh07kUOCvJyiR7d22/WVVrZzG0C4C3TzzMmmRZktNmcZ4kaYAMH5oPpwI3J9lG7+HT06vq5zO0/wS9Wyl3A7cA0z4jUlX/AbwD+Aywnt4DrafPZlBVdRXwt8Bl3e2d7wOrZnOuJGlwUlXDHoO04I2NjdX4+PiwhyFJIyXJTVU1Nnm/Kx+SJKkpw4eaSLI6ybYptpuHPTZJUlt+2kVNdJ8w8VMmkiRXPiRJUluGD0mS1JThQ5IkNWX4kCRJTRk+JElSU4YPSZLUlOFDkiQ1ZfiQJElNGT4kSVJThg9JktSU4UOSJDVl+JAkSU0ZPiRJUlOGD0mS1JThQ5IkNWX4kCRJTRk+JElSU4YPSZLUlOFDkiQ1ZfiQJElNGT4kSVJThg9JktSU4UOSJDVl+NCMkqxN8qJhj2M6SV6b5GszHL82yWtajkmSNLNFwx6ANJ+qatWwxyBJejxXPiRJUlOGD83GyiTfTbIlyeVJnpLkqUk+n2Rjkge614dPnNDdDvlxkq1JfpJk9Y4ukuQNSW7tzrklya93+89Jckff/lc98dT8Qze+HyQ5ue/A9Ule3zemryX5QDfmnySZdmUkydlJxpOMb9y4cc5vmiRpaoYPzcargVOBZwC/BryW3r+djwMrgCOA/wH+ESDJYuDDwKqq2g/4LeA7M10gyWnAu4AzgaXAy4FN3eE7gOcBy4B3A59Mckjf6ScBPwYOAt4JfDbJAdNc6iTgtq7t+4GPJclUDavqwqoaq6qx5cuXzzR8SdIcGD40Gx+uqnuq6n7gamBlVW2qqs9U1UNVtRU4F/idvnMeA45Psk9Vra+qm3dwjdcD76+qG6vnR1W1DqCqruyu/1hVXQ7cDjyn79yfAR+qql90x28DXjrNddZV1UVV9ShwCXAIcPDc3g5J0q4wfGg2NvS9fghYkmTfJB9Nsi7Jg8BXgf2T7FVV24E/BN4IrE/yhSS/soNrPJ3eCscTJDkzyXeSbE6yGTie3srFhLurqvq+XgccuqNaquqh7uWSHYxNkjRAhg/trL8CjgVOqqqlwPO7/QGoquuq6sX0VhZ+AFy0g/7uBI6avDPJiu7cNwEHVtX+wPcnrtM5bNKtkyOAe+ZckSSpCcOHdtZ+9J7z2Nw9X/HOiQNJDk7y8u7Zj4eBbcCjO+jvn4G3JvmN9BzdBY/FQAEbu77Porfy0e9pwJ8neXL37MivAtfseomSpPlg+NDO+hCwD3Af8A3gi33HnkRvZeQe4H56z4L82UydVdWV9J4buRTYCnwOOKCqbgHOA74O3AucANww6fRvAsd0YzkX+IOq2oQkaUHK42+VS5rK2NhYjY+PD3sYkjRSktxUVWOT97vyIUmSmjJ8qJkkFyTZNsV2wbDHJklqx7/tomaq6o30Pn4rSdqDufIhSZKaMnxIkqSmDB+SJKkpw4ckSWrK8CFJkpoyfEiSpKYMH5IkqSnDhyRJasrwIUmSmjJ8SJKkpgwfkiSpKcOHJElqyvAhSZKaMnxIkqSmDB+SJKmpVNWwxyAteEm2ArcNexwDchBw37AHMSC7Sy27Sx1gLQvVsGpZUVXLJ+9cNISBSKPotqoaG/YgBiHJuLUsLLtLHWAtC9VCq8XbLpIkqSnDhyRJasrwIc3OhcMewABZy8Kzu9QB1rJQLahafOBUkiQ15cqHJElqmXAgjQAAA59JREFUyvAhSZKaMnxoj5TkgCRXJdmeZF2SM2Zo+5dJNiTZkuTiJHvvTD/zZYC1XJ/k50m2dVvz32sy21qSHJ/kuiT3JXnCveNRmpdZ1DJK8/KaJDcleTDJXUnen2TRXPuZTwOsZZTm5fQkt3Xf9z9LckmSpXPtZ5AMH9pTfQT4X+BgYDXwT0mOm9woySnAOcDJwJHALwPvnms/82xQtQC8qaqWdNux8zrqqc32/fwFcAXwul3sZz4NqhYYnXnZF3gzvV9odRK9f2tv3Yl+5tOgaoHRmZcbgOdW1TJ63/eLgPfsRD+DU1VubnvUBizuvtGe2bfvX4E1U7S9FHhv39cnAxvm2s9Cr6X7+nrg9aMwL33Hj+79GNu1fhZqLaM6L33t3gJcPcrzMlUtozwvwBLgE8A1w5wXVz60J3om8GhV/bBv338DUyX947pj/e0OTnLgHPuZL4OqZcL7uuX/G5K8YOCjndmg3s9Rm5fZGNV5eT5w8wD6GZRB1TJhZOYlyW8n2QJsBX4f+NDO9DMohg/tiZYAWybt2wLsN4u2E6/3m2M/82VQtQC8jd6S7GH0fifA1UmOGtxQd2hQ7+eozcuOjOS8JDkLGAM+sCv9DNigaoERm5eq+lr1brscDvwdsHZn+hkUw4f2RNuApZP2LaX3fwQ7ajvxeusc+5kvg6qFqvpmVW2tqoer6hJ694lfMuDxzmRQ7+eozcuMRnFekrwSWAOsqqqJP2Y2kvMyTS0jOS8AVXU38EXgsl3pZ1cZPrQn+iGwKMkxfftO5IlLqnT7TpzU7t6q2jTHfubLoGqZSgEZyChnZ1Dv56jNy1wt6HlJcipwEfC7VfW9ne1nngyqlqks6HmZZBEwsUoznHkZ1sMybm7D3Oil/k/Re9jqufSWGY+bot2pwAbgWcBTgf+k70Gs2faz0GsB9gdOAZ5C7wfTamA7cOwCrSXdWJ9F74f+U4C9R3Repq1lBOflhcAm4Pm70s9Cr2UE52U1cET3b20F8BXgs8Ocl2ZvkpvbQtqAA4DPdT8wfgqc0e0/gt4y5BF9bd8C3As8CHx80n/kpuxn1GoBlgM30ltq3Qx8A3jxQq2F3keFa9K2dhTnZaZaRnBevgw80u2b2K4d0XmZtpYRnJdzgbu6dnfRe0blwGHOi3/bRZIkNeUzH5IkqSnDhyRJasrwIUmSmjJ8SJKkpgwfkiSpKcOHJElqyvAhSZKaMnxIkqSmDB+SJKmp/wMLIvy9/Qk+TAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x648 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(figsize=(6, 9))\n",
    "\n",
    "ind = np.argsort(xgb_model.feature_importances_)[::-1]\n",
    "features_sorted = np.array(features)[ind]\n",
    "importances_sorted = xgb_model.feature_importances_[ind]\n",
    "\n",
    "plt.barh(y=range(len(features)), width=importances_sorted, height=0.2)\n",
    "plt.title('Gain')\n",
    "plt.yticks(ticks=range(len(features)), labels=features_sorted)\n",
    "plt.gca().invert_yaxis()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modeling (part 2): Linear models & Ensembles\n",
    "\n",
    "Given the randomness of the _Titanic dataset_ , we can be satisfied with the performance of `xgboost` model above. Still, it is always usefull to try a variety of models and approaches, especially since `vaex` makes makes this process rather simple. \n",
    "\n",
    "In the following part we will use a couple of linear models as our predictors, this time straight from `scikit-learn`. This requires us to pre-process the data in a slightly different way.\n",
    "\n",
    "### Feature pre-processing for linear models\n",
    "\n",
    "When using linear models, the safest option is to encode categorical variables with the one-hot encoding scheme, especially if they have low cardinality. We will do this for the \"family_size\" and \"deck\" features. Note that the \"sex\" feature is already encoded since it has only unique values options. \n",
    "\n",
    "The \"name_title\" feature is a bit more tricky. Since in its original form it has some values that only appear a couple of times, we will do a trick: we will one-hot encode the frequency encoded values. This will reduce cardinality of the feature, while also preserving the most important, i.e. most common values.\n",
    "\n",
    "Regarding the \"age\" and \"fare\", to add some variance in the model, we will not convert them to categorical as before, but simply remove their mean and standard-deviations (standard-scaling). We will do the same to the \"fare_per_family_member\" feature.\n",
    "\n",
    "\n",
    "Finally, we will drop out any other features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:41.979030Z",
     "start_time": "2020-05-01T17:12:41.922481Z"
    }
   },
   "outputs": [],
   "source": [
    "# One-hot encode categorical features\n",
    "one_hot = vaex.ml.OneHotEncoder(features=['deck', 'family_size', 'name_title'])\n",
    "df_train = one_hot.fit_transform(df_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:42.072684Z",
     "start_time": "2020-05-01T17:12:41.988593Z"
    }
   },
   "outputs": [],
   "source": [
    "# Standard scale numerical features\n",
    "standard_scaler = vaex.ml.StandardScaler(features=['age', 'fare', 'fare_per_family_member'])\n",
    "df_train = standard_scaler.fit_transform(df_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:42.088401Z",
     "start_time": "2020-05-01T17:12:42.076102Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['deck_A',\n",
       " 'deck_B',\n",
       " 'deck_C',\n",
       " 'deck_D',\n",
       " 'deck_E',\n",
       " 'deck_F',\n",
       " 'deck_G',\n",
       " 'deck_M',\n",
       " 'family_size_1',\n",
       " 'family_size_2',\n",
       " 'family_size_3',\n",
       " 'family_size_4',\n",
       " 'family_size_5',\n",
       " 'family_size_6',\n",
       " 'family_size_7',\n",
       " 'family_size_8',\n",
       " 'family_size_11',\n",
       " 'standard_scaled_age',\n",
       " 'standard_scaled_fare',\n",
       " 'standard_scaled_fare_per_family_member',\n",
       " 'label_encoded_sex']"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get the features for training a linear model\n",
    "features_linear = df_train.get_column_names(regex='^deck_|^family_size_|^frequency_encoded_name_title_')\n",
    "features_linear += df_train.get_column_names(regex='^standard_scaled_')\n",
    "features_linear += ['label_encoded_sex']\n",
    "features_linear"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Estimators: `SVC` and `LogisticRegression`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:42.170145Z",
     "start_time": "2020-05-01T17:12:42.095159Z"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.svm import SVC\n",
    "from sklearn.linear_model import LogisticRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:42.646357Z",
     "start_time": "2020-05-01T17:12:42.172042Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/jovan/miniconda3/lib/python3.7/site-packages/sklearn/svm/_base.py:231: ConvergenceWarning: Solver terminated early (max_iter=1000).  Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
      "  % self.max_iter, ConvergenceWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  pclass</th><th>survived  </th><th>name                            </th><th>sex   </th><th style=\"text-align: right;\">  age</th><th style=\"text-align: right;\">  sibsp</th><th style=\"text-align: right;\">  parch</th><th>ticket   </th><th style=\"text-align: right;\">   fare</th><th>cabin  </th><th>embarked  </th><th>boat  </th><th style=\"text-align: right;\">  body</th><th>home_dest           </th><th>name_title  </th><th style=\"text-align: right;\">  name_num_words</th><th>deck  </th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th>prediction_xgb  </th><th style=\"text-align: right;\">  deck_A</th><th style=\"text-align: right;\">  deck_B</th><th style=\"text-align: right;\">  deck_C</th><th style=\"text-align: right;\">  deck_D</th><th style=\"text-align: right;\">  deck_E</th><th style=\"text-align: right;\">  deck_F</th><th style=\"text-align: right;\">  deck_G</th><th style=\"text-align: right;\">  deck_M</th><th style=\"text-align: right;\">  family_size_1</th><th style=\"text-align: right;\">  family_size_2</th><th style=\"text-align: right;\">  family_size_3</th><th style=\"text-align: right;\">  family_size_4</th><th style=\"text-align: right;\">  family_size_5</th><th style=\"text-align: right;\">  family_size_6</th><th style=\"text-align: right;\">  family_size_7</th><th style=\"text-align: right;\">  family_size_8</th><th style=\"text-align: right;\">  family_size_11</th><th style=\"text-align: right;\">  name_title_Capt</th><th style=\"text-align: right;\">  name_title_Col</th><th style=\"text-align: right;\">  name_title_Countess</th><th style=\"text-align: right;\">  name_title_Don</th><th style=\"text-align: right;\">  name_title_Dona</th><th style=\"text-align: right;\">  name_title_Dr</th><th style=\"text-align: right;\">  name_title_Jonkheer</th><th style=\"text-align: right;\">  name_title_Lady</th><th style=\"text-align: right;\">  name_title_Major</th><th style=\"text-align: right;\">  name_title_Master</th><th style=\"text-align: right;\">  name_title_Miss</th><th style=\"text-align: right;\">  name_title_Mlle</th><th style=\"text-align: right;\">  name_title_Mme</th><th style=\"text-align: right;\">  name_title_Mr</th><th style=\"text-align: right;\">  name_title_Mrs</th><th style=\"text-align: right;\">  name_title_Ms</th><th style=\"text-align: right;\">  name_title_Rev</th><th style=\"text-align: right;\">  standard_scaled_age</th><th style=\"text-align: right;\">  standard_scaled_fare</th><th style=\"text-align: right;\">  standard_scaled_fare_per_family_member</th><th>prediction_svc  </th><th>prediction_lr  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Stoytcheff, Mr. Ilia            </td><td>male  </td><td style=\"text-align: right;\">   19</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>349205   </td><td style=\"text-align: right;\"> 7.8958</td><td>M      </td><td>S         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               57</td><td style=\"text-align: right;\">                  7.8958</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">            -0.807704</td><td style=\"text-align: right;\">             -0.493719</td><td style=\"text-align: right;\">                               -0.342804</td><td>False           </td><td>False          </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">       1</td><td>False     </td><td>Payne, Mr. Vivian Ponsonby      </td><td>male  </td><td style=\"text-align: right;\">   23</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>12749    </td><td style=\"text-align: right;\">93.5   </td><td>B24    </td><td>S         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>Montreal, PQ        </td><td>Mr          </td><td style=\"text-align: right;\">               4</td><td>B     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               23</td><td style=\"text-align: right;\">                 93.5   </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   1</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">            -0.492921</td><td style=\"text-align: right;\">              1.19613 </td><td style=\"text-align: right;\">                                1.99718 </td><td>False           </td><td>True           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">       3</td><td>True      </td><td>Abbott, Mrs. Stanton (Rosa Hunt)</td><td>female</td><td style=\"text-align: right;\">   35</td><td style=\"text-align: right;\">      1</td><td style=\"text-align: right;\">      1</td><td>C.A. 2673</td><td style=\"text-align: right;\">20.25  </td><td>M      </td><td>S         </td><td>A     </td><td style=\"text-align: right;\">   nan</td><td>East Providence, RI </td><td>Mrs         </td><td style=\"text-align: right;\">               5</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            3</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">              105</td><td style=\"text-align: right;\">                  6.75  </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td>True            </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">             0.45143 </td><td style=\"text-align: right;\">             -0.249845</td><td style=\"text-align: right;\">                               -0.374124</td><td>True            </td><td>True           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">       2</td><td>True      </td><td>Hocking, Miss. Ellen \"Nellie\"   </td><td>female</td><td style=\"text-align: right;\">   20</td><td style=\"text-align: right;\">      2</td><td style=\"text-align: right;\">      1</td><td>29105    </td><td style=\"text-align: right;\">23     </td><td>M      </td><td>S         </td><td>4     </td><td style=\"text-align: right;\">   nan</td><td>Cornwall / Akron, OH</td><td>Miss        </td><td style=\"text-align: right;\">               4</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            4</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               40</td><td style=\"text-align: right;\">                  5.75  </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.201528</td><td>True            </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                1</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">            -0.729008</td><td style=\"text-align: right;\">             -0.195559</td><td style=\"text-align: right;\">                               -0.401459</td><td>True            </td><td>True           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Nilsson, Mr. August Ferdinand   </td><td>male  </td><td style=\"text-align: right;\">   21</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>350410   </td><td style=\"text-align: right;\"> 7.8542</td><td>M      </td><td>S         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                </td><td>Mr          </td><td style=\"text-align: right;\">               4</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               63</td><td style=\"text-align: right;\">                  7.8542</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">            -0.650312</td><td style=\"text-align: right;\">             -0.494541</td><td style=\"text-align: right;\">                               -0.343941</td><td>False           </td><td>False          </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    pclass  survived    name                              sex       age    sibsp    parch  ticket        fare  cabin    embarked    boat      body  home_dest             name_title      name_num_words  deck      multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title  prediction_xgb      deck_A    deck_B    deck_C    deck_D    deck_E    deck_F    deck_G    deck_M    family_size_1    family_size_2    family_size_3    family_size_4    family_size_5    family_size_6    family_size_7    family_size_8    family_size_11    name_title_Capt    name_title_Col    name_title_Countess    name_title_Don    name_title_Dona    name_title_Dr    name_title_Jonkheer    name_title_Lady    name_title_Major    name_title_Master    name_title_Miss    name_title_Mlle    name_title_Mme    name_title_Mr    name_title_Mrs    name_title_Ms    name_title_Rev    standard_scaled_age    standard_scaled_fare    standard_scaled_fare_per_family_member  prediction_svc    prediction_lr\n",
       "  0         3  False       Stoytcheff, Mr. Ilia              male       19        0        0  349205      7.8958  M        S           None       nan  None                  Mr                           3  M                   0            1              1           0                 57                    7.8958                    0                         0                     0                        0.578797  False                    0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0              -0.807704               -0.493719                                 -0.342804  False             False\n",
       "  1         1  False       Payne, Mr. Vivian Ponsonby        male       23        0        0  12749      93.5     B24      S           None       nan  Montreal, PQ          Mr                           4  B                   0            1              1           0                 23                   93.5                       0                         0                     1                        0.578797  False                    0         1         0         0         0         0         0         0                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0              -0.492921                1.19613                                   1.99718   False             True\n",
       "  2         3  True        Abbott, Mrs. Stanton (Rosa Hunt)  female     35        1        1  C.A. 2673  20.25    M        S           A          nan  East Providence, RI   Mrs                          5  M                   0            1              3           0                105                    6.75                      1                         0                     0                        0.145177  True                     0         0         0         0         0         0         0         1                0                0                1                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                0                 1                0                 0               0.45143                -0.249845                                 -0.374124  True              True\n",
       "  3         2  True        Hocking, Miss. Ellen \"Nellie\"     female     20        2        1  29105      23       M        S           4          nan  Cornwall / Akron, OH  Miss                         4  M                   0            1              4           0                 40                    5.75                      1                         0                     0                        0.201528  True                     0         0         0         0         0         0         0         1                0                0                0                1                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  1                  0                 0                0                 0                0                 0              -0.729008               -0.195559                                 -0.401459  True              True\n",
       "  4         3  False       Nilsson, Mr. August Ferdinand     male       21        0        0  350410      7.8542  M        S           None       nan  None                  Mr                           4  M                   0            1              1           0                 63                    7.8542                    0                         0                     0                        0.578797  False                    0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0              -0.650312               -0.494541                                 -0.343941  False             False"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The Support Vector Classifier\n",
    "vaex_svc = vaex.ml.sklearn.Predictor(features=features_linear, \n",
    "                                     target='survived',\n",
    "                                     model=SVC(max_iter=1000, random_state=42),\n",
    "                                     prediction_name='prediction_svc')\n",
    "\n",
    "# Logistic Regression\n",
    "vaex_logistic = vaex.ml.sklearn.Predictor(features=features_linear, \n",
    "                                          target='survived',\n",
    "                                          model=LogisticRegression(max_iter=1000, random_state=42),\n",
    "                                          prediction_name='prediction_lr')\n",
    "\n",
    "# Train the new models and apply the transformation to the train dataframe\n",
    "for model in [vaex_svc, vaex_logistic]:\n",
    "    model.fit(df_train)\n",
    "    df_train = model.transform(df_train)\n",
    "    \n",
    "# Preview of the train DataFrame\n",
    "df_train.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ensemble\n",
    "\n",
    "Just as before, the predictions from the `SVC` and the `LogisticRegression` classifiers are added as virtual columns in the training dataset. This is quite powerful, since now we can easily use them to create an ensemble! For example, let's do a weighted mean."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:42.958447Z",
     "start_time": "2020-05-01T17:12:42.653715Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>prediction_xgb  </th><th>prediction_svc  </th><th>prediction_lr  </th><th>prediction_final  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>False           </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>False           </td><td>False           </td><td>True           </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>True            </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>True            </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>False           </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td>...                              </td><td>...             </td><td>...             </td><td>...            </td><td>...               </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,042</i></td><td>False           </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,043</i></td><td>False           </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,044</i></td><td>True            </td><td>True            </td><td>False          </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,045</i></td><td>False           </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,046</i></td><td>False           </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#      prediction_xgb    prediction_svc    prediction_lr    prediction_final\n",
       "0      False             False             False            False\n",
       "1      False             False             True             False\n",
       "2      True              True              True             True\n",
       "3      True              True              True             True\n",
       "4      False             False             False            False\n",
       "...    ...               ...               ...              ...\n",
       "1,042  False             False             False            False\n",
       "1,043  False             True              True             True\n",
       "1,044  True              True              False            True\n",
       "1,045  False             True              True             True\n",
       "1,046  False             False             False            False"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Weighed mean of the classes\n",
    "prediction_final = (df_train.prediction_xgb.astype('int') * 0.3 + \n",
    "                    df_train.prediction_svc.astype('int') * 0.5 + \n",
    "                    df_train.prediction_xgb.astype('int') * 0.2)\n",
    "# Get the predicted class\n",
    "prediction_final = (prediction_final >= 0.5)\n",
    "# Add the expression to the train DataFrame\n",
    "df_train['prediction_final'] = prediction_final\n",
    "\n",
    "# Preview\n",
    "df_train[df_train.get_column_names(regex='^predict')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance (part 2)\n",
    "\n",
    "Applying the ensembler to the test set is just as easy as before. We just need to get the new state of the training DataFrame, and transfer it to the test DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:43.334411Z",
     "start_time": "2020-05-01T17:12:42.961373Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  pclass</th><th>survived  </th><th>name                                        </th><th>sex   </th><th style=\"text-align: right;\">   age</th><th style=\"text-align: right;\">  sibsp</th><th style=\"text-align: right;\">  parch</th><th>ticket          </th><th style=\"text-align: right;\">   fare</th><th>cabin  </th><th>embarked  </th><th>boat  </th><th style=\"text-align: right;\">  body</th><th>home_dest               </th><th>name_title  </th><th style=\"text-align: right;\">  name_num_words</th><th>deck  </th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th>prediction_xgb  </th><th style=\"text-align: right;\">  deck_A</th><th style=\"text-align: right;\">  deck_B</th><th style=\"text-align: right;\">  deck_C</th><th style=\"text-align: right;\">  deck_D</th><th style=\"text-align: right;\">  deck_E</th><th style=\"text-align: right;\">  deck_F</th><th style=\"text-align: right;\">  deck_G</th><th style=\"text-align: right;\">  deck_M</th><th style=\"text-align: right;\">  family_size_1</th><th style=\"text-align: right;\">  family_size_2</th><th style=\"text-align: right;\">  family_size_3</th><th style=\"text-align: right;\">  family_size_4</th><th style=\"text-align: right;\">  family_size_5</th><th style=\"text-align: right;\">  family_size_6</th><th style=\"text-align: right;\">  family_size_7</th><th style=\"text-align: right;\">  family_size_8</th><th style=\"text-align: right;\">  family_size_11</th><th style=\"text-align: right;\">  name_title_Capt</th><th style=\"text-align: right;\">  name_title_Col</th><th style=\"text-align: right;\">  name_title_Countess</th><th style=\"text-align: right;\">  name_title_Don</th><th style=\"text-align: right;\">  name_title_Dona</th><th style=\"text-align: right;\">  name_title_Dr</th><th style=\"text-align: right;\">  name_title_Jonkheer</th><th style=\"text-align: right;\">  name_title_Lady</th><th style=\"text-align: right;\">  name_title_Major</th><th style=\"text-align: right;\">  name_title_Master</th><th style=\"text-align: right;\">  name_title_Miss</th><th style=\"text-align: right;\">  name_title_Mlle</th><th style=\"text-align: right;\">  name_title_Mme</th><th style=\"text-align: right;\">  name_title_Mr</th><th style=\"text-align: right;\">  name_title_Mrs</th><th style=\"text-align: right;\">  name_title_Ms</th><th style=\"text-align: right;\">  name_title_Rev</th><th style=\"text-align: right;\">  standard_scaled_age</th><th style=\"text-align: right;\">  standard_scaled_fare</th><th style=\"text-align: right;\">  standard_scaled_fare_per_family_member</th><th>prediction_svc  </th><th>prediction_lr  </th><th>prediction_final  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>O'Connor, Mr. Patrick                       </td><td>male  </td><td style=\"text-align: right;\">28.032</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>366713          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                    </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           84.096</td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">           -0.096924 </td><td style=\"text-align: right;\">             -0.496597</td><td style=\"text-align: right;\">                               -0.346789</td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Canavan, Mr. Patrick                        </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>364858          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>Ireland Philadelphia, PA</td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">           -0.650312 </td><td style=\"text-align: right;\">             -0.496597</td><td style=\"text-align: right;\">                               -0.346789</td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">       1</td><td>False     </td><td>Ovies y Rodriguez, Mr. Servando             </td><td>male  </td><td style=\"text-align: right;\">28.5  </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>PC 17562        </td><td style=\"text-align: right;\">27.7208</td><td>D43    </td><td>C         </td><td>None  </td><td style=\"text-align: right;\">   189</td><td>?Havana, Cuba           </td><td>Mr          </td><td style=\"text-align: right;\">               5</td><td>D     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           28.5  </td><td style=\"text-align: right;\">                 27.7208</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       2</td><td style=\"text-align: right;\">                   4</td><td style=\"text-align: right;\">                      0.578797</td><td>True            </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">           -0.0600935</td><td style=\"text-align: right;\">             -0.102369</td><td style=\"text-align: right;\">                                0.19911 </td><td>False           </td><td>False          </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Windelov, Mr. Einar                         </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>SOTON/OQ 3101317</td><td style=\"text-align: right;\"> 7.25  </td><td>M      </td><td>S         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                    </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.25  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">           -0.650312 </td><td style=\"text-align: right;\">             -0.506468</td><td style=\"text-align: right;\">                               -0.360456</td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">       2</td><td>True      </td><td>Shelley, Mrs. William (Imanita Parrish Hall)</td><td>female</td><td style=\"text-align: right;\">25    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      1</td><td>230433          </td><td style=\"text-align: right;\">26     </td><td>M      </td><td>S         </td><td>12    </td><td style=\"text-align: right;\">   nan</td><td>Deer Lodge, MT          </td><td>Mrs         </td><td style=\"text-align: right;\">               6</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            2</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           50    </td><td style=\"text-align: right;\">                 13     </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td>True            </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">                    0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                 0</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">                0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">           -0.335529 </td><td style=\"text-align: right;\">             -0.136338</td><td style=\"text-align: right;\">                               -0.203281</td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    pclass  survived    name                                          sex        age    sibsp    parch  ticket               fare  cabin    embarked    boat      body  home_dest                 name_title      name_num_words  deck      multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title  prediction_xgb      deck_A    deck_B    deck_C    deck_D    deck_E    deck_F    deck_G    deck_M    family_size_1    family_size_2    family_size_3    family_size_4    family_size_5    family_size_6    family_size_7    family_size_8    family_size_11    name_title_Capt    name_title_Col    name_title_Countess    name_title_Don    name_title_Dona    name_title_Dr    name_title_Jonkheer    name_title_Lady    name_title_Major    name_title_Master    name_title_Miss    name_title_Mlle    name_title_Mme    name_title_Mr    name_title_Mrs    name_title_Ms    name_title_Rev    standard_scaled_age    standard_scaled_fare    standard_scaled_fare_per_family_member  prediction_svc    prediction_lr    prediction_final\n",
       "  0         3  False       O'Connor, Mr. Patrick                         male    28.032        0        0  366713             7.75    M        Q           None       nan  None                      Mr                           3  M                   0            1              1           0             84.096                    7.75                      0                         1                     0                        0.578797  False                    0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0             -0.096924                -0.496597                                 -0.346789  False             False            False\n",
       "  1         3  False       Canavan, Mr. Patrick                          male    21            0        0  364858             7.75    M        Q           None       nan  Ireland Philadelphia, PA  Mr                           3  M                   0            1              1           0             63                        7.75                      0                         1                     0                        0.578797  False                    0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0             -0.650312                -0.496597                                 -0.346789  False             False            False\n",
       "  2         1  False       Ovies y Rodriguez, Mr. Servando               male    28.5          0        0  PC 17562          27.7208  D43      C           None       189  ?Havana, Cuba             Mr                           5  D                   0            1              1           0             28.5                     27.7208                    0                         2                     4                        0.578797  True                     0         0         0         1         0         0         0         0                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0             -0.0600935               -0.102369                                  0.19911   False             False            True\n",
       "  3         3  False       Windelov, Mr. Einar                           male    21            0        0  SOTON/OQ 3101317   7.25    M        S           None       nan  None                      Mr                           3  M                   0            1              1           0             63                        7.25                      0                         0                     0                        0.578797  False                    0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                1                 0                0                 0             -0.650312                -0.506468                                 -0.360456  False             False            False\n",
       "  4         2  True        Shelley, Mrs. William (Imanita Parrish Hall)  female  25            0        1  230433            26       M        S           12         nan  Deer Lodge, MT            Mrs                          6  M                   0            1              2           0             50                       13                         1                         0                     0                        0.145177  True                     0         0         0         0         0         0         0         1                0                1                0                0                0                0                0                0                 0                  0                 0                      0                 0                  0                0                      0                  0                   0                    0                  0                  0                 0                0                 1                0                 0             -0.335529                -0.136338                                 -0.203281  True              True             True"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# State transfer\n",
    "state_new = df_train.state_get()\n",
    "df_test.state_set(state_new)\n",
    "\n",
    "# Preview\n",
    "df_test.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's check the performance of all the individual models as well as on the ensembler, on the test set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-01T17:12:43.490196Z",
     "start_time": "2020-05-01T17:12:43.337368Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "prediction_xgb\n",
      "Accuracy: 0.798\n",
      "f1 score: 0.744\n",
      "roc-auc: 0.785\n",
      " \n",
      "prediction_svc\n",
      "Accuracy: 0.802\n",
      "f1 score: 0.743\n",
      "roc-auc: 0.786\n",
      " \n",
      "prediction_lr\n",
      "Accuracy: 0.779\n",
      "f1 score: 0.713\n",
      "roc-auc: 0.762\n",
      " \n",
      "prediction_final\n",
      "Accuracy: 0.821\n",
      "f1 score: 0.785\n",
      "roc-auc: 0.817\n",
      " \n"
     ]
    }
   ],
   "source": [
    "pred_columns = df_train.get_column_names(regex='^prediction_')\n",
    "for i in pred_columns:\n",
    "    print(i)\n",
    "    binary_metrics(y_true=df_test.survived.values, y_pred=df_test[i].values)\n",
    "    print(' ')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that our ensembler is doing a better job than any idividual model, as expected.\n",
    "\n",
    "Thanks you for going over this example. Feel free to copy, modify, and in general play around with this notebook."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
